tidymut.cleaners.base_config module

class tidymut.cleaners.base_config.BaseCleanerConfig(pipeline_name: str = 'base_cleaner', strict_mode: bool = True, num_workers: int = 16, validate_config: bool = True)[source]

Bases: ABC

Base configuration class for all dataset cleaners

This abstract base class provides common configuration functionality that can be inherited by specific cleaner configurations.

pipeline_name

Name of the cleaning pipeline

Type:

str

strict_mode

Whether to stop on errors (True) or continue with warnings (False)

Type:

bool

num_workers

Default number of worker processes

Type:

int

validate_config

Whether to validate configuration before use

Type:

bool

classmethod from_dict(config_dict: Dict[str, Any]) CleanerConfigType[source]

Create configuration object from dictionary

Parameters:

config_dict (Dict[str, Any]) – Dictionary containing configuration parameters

Returns:

Configuration object

Return type:

BaseCleanerConfig

classmethod from_json(json_path: str | Path) CleanerConfigType[source]

Load configuration from JSON file

Parameters:

json_path (Union[str, Path]) – Path to JSON configuration file

Returns:

Configuration object

Return type:

BaseCleanerConfig

Raises:

FileNotFoundError – If configuration file does not exist

get_summary() str[source]

Get a human-readable summary of the configuration

Returns:

String summary of the configuration

Return type:

str

merge(partial_config: Dict[str, Any]) CleanerConfigType[source]

Merge partial configuration with current configuration

Parameters:

partial_config (Dict[str, Any]) – Dictionary containing configuration values to update

Returns:

New configuration object with merged values

Return type:

BaseCleanerConfig

num_workers: int = 16
pipeline_name: str = 'base_cleaner'
strict_mode: bool = True
to_dict(exclude_callables: bool = True) Dict[str, Any][source]

Convert configuration to dictionary

Parameters:

exclude_callables (bool, optional) – Whether to exclude callable objects (functions, lambdas), by default True

Returns:

Dictionary representation of the configuration

Return type:

Dict[str, Any]

to_json(json_path: str | Path, **json_kwargs) None[source]

Save configuration to JSON file

Parameters:
  • json_path (Union[str, Path]) – Path where to save the JSON file

  • **json_kwargs – Additional arguments passed to json.dump

abstractmethod validate() None[source]

Validate the configuration

This method should be implemented by subclasses to perform specific validation logic.

Raises:

ValueError – If configuration is invalid

validate_config: bool = True