tidymut.cleaners.base_config module
- class tidymut.cleaners.base_config.BaseCleanerConfig(pipeline_name: str = 'base_cleaner', strict_mode: bool = True, num_workers: int = 16, validate_config: bool = True)[source]
Bases:
ABC
Base configuration class for all dataset cleaners
This abstract base class provides common configuration functionality that can be inherited by specific cleaner configurations.
- pipeline_name
Name of the cleaning pipeline
- Type:
str
- strict_mode
Whether to stop on errors (True) or continue with warnings (False)
- Type:
bool
- num_workers
Default number of worker processes
- Type:
int
- validate_config
Whether to validate configuration before use
- Type:
bool
- classmethod from_dict(config_dict: Dict[str, Any]) CleanerConfigType [source]
Create configuration object from dictionary
- Parameters:
config_dict (Dict[str, Any]) – Dictionary containing configuration parameters
- Returns:
Configuration object
- Return type:
- classmethod from_json(json_path: str | Path) CleanerConfigType [source]
Load configuration from JSON file
- Parameters:
json_path (Union[str, Path]) – Path to JSON configuration file
- Returns:
Configuration object
- Return type:
- Raises:
FileNotFoundError – If configuration file does not exist
- get_summary() str [source]
Get a human-readable summary of the configuration
- Returns:
String summary of the configuration
- Return type:
str
- merge(partial_config: Dict[str, Any]) CleanerConfigType [source]
Merge partial configuration with current configuration
- Parameters:
partial_config (Dict[str, Any]) – Dictionary containing configuration values to update
- Returns:
New configuration object with merged values
- Return type:
- num_workers: int = 16
- pipeline_name: str = 'base_cleaner'
- strict_mode: bool = True
- to_dict(exclude_callables: bool = True) Dict[str, Any] [source]
Convert configuration to dictionary
- Parameters:
exclude_callables (bool, optional) – Whether to exclude callable objects (functions, lambdas), by default True
- Returns:
Dictionary representation of the configuration
- Return type:
Dict[str, Any]
- to_json(json_path: str | Path, **json_kwargs) None [source]
Save configuration to JSON file
- Parameters:
json_path (Union[str, Path]) – Path where to save the JSON file
**json_kwargs – Additional arguments passed to json.dump
- abstractmethod validate() None [source]
Validate the configuration
This method should be implemented by subclasses to perform specific validation logic.
- Raises:
ValueError – If configuration is invalid
- validate_config: bool = True