kedro.contrib.io

Description

This module contains functionality which we might consider moving into the kedro.io module (e.g. additional AbstractDataSets and extensions/alternative DataCatalogs.

Data catalog wrapper

kedro.contrib.io.catalog_with_default.DataCatalogWithDefault([…]) A DataCatalog with a default DataSet implementation for any data set which is not registered in the catalog.

DataSets

kedro.contrib.io.azure.CSVBlobDataSet(…[, …]) CSVBlobDataSet loads and saves csv files in Microsoft’s Azure blob storage.
kedro.contrib.io.azure.JSONBlobDataSet(…) JSONBlobDataSet loads and saves json(line-delimited) files in Microsoft’s Azure blob storage.
kedro.contrib.io.bioinformatics.BioSequenceLocalDataSet(…) BioSequenceLocalDataSet loads and saves data to a sequence file.
kedro.contrib.io.cached.CachedDataSet(dataset) CachedDataSet is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media.
kedro.contrib.io.feather.FeatherLocalDataSet(…) FeatherLocalDataSet loads and saves data to a local feather file.
kedro.contrib.io.matplotlib.MatplotlibWriter(…) MatplotlibWriter saves matplotlib objects as image files.
kedro.contrib.io.parquet.ParquetS3DataSet(…) ParquetS3DataSet loads and saves data to a file in S3.
kedro.contrib.io.pyspark.SparkDataSet(filepath) SparkDataSet loads and saves Spark data frames.
kedro.contrib.io.pyspark.SparkJDBCDataSet(…) SparkJDBCDataSet loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC.
kedro.contrib.io.yaml_local.YAMLLocalDataSet(…) YAMLLocalDataset loads and saves data to a local yaml file using PyYAML.
kedro.contrib.io.matplotlib.MatplotlibWriter(…) MatplotlibWriter saves matplotlib objects as image files.

DataSet Transformers

kedro.contrib.io.transformers.ProfileTimeTransformer A transformer that logs the runtime of data set load and save calls