kedro.contrib.io

Description

This module contains functionality which we might consider moving into the kedro.io module (e.g. additional AbstractDataSets and extensions/alternative DataCatalogs.

Data catalog wrapper

kedro.contrib.io.catalog_with_default.DataCatalogWithDefault([…]) A DataCatalog with a default DataSet implementation for any data set which is not registered in the catalog.

DataSets

kedro.contrib.io.azure.CSVBlobDataSet(…[, …]) CSVBlobDataSet loads and saves csv files in Microsoft’s Azure blob storage.
kedro.contrib.io.azure.JSONBlobDataSet(…) JSONBlobDataSet loads and saves json(line-delimited) files in Microsoft’s Azure blob storage.
kedro.contrib.io.bioinformatics.BioSequenceLocalDataSet(…) BioSequenceLocalDataSet loads and saves data to a sequence file.
kedro.contrib.io.cached.CachedDataSet(dataset) CachedDataSet is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media.
kedro.contrib.io.pyspark.SparkDataSet(filepath) SparkDataSet loads and saves Spark data frames.
kedro.contrib.io.pyspark.SparkJDBCDataSet(…) SparkJDBCDataSet loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC.
kedro.contrib.io.parquet.ParquetS3DataSet(…) ParquetS3DataSet loads and saves data to a file in S3.
kedro.contrib.io.yaml_local.YAMLLocalDataSet(…) YAMLLocalDataset loads and saves data to a local yaml file using PyYAML.