tidymut.utils.type_converter module
Type conversion utilities for pandas DataFrames.
This module provides flexible and efficient data type conversion capabilities with support for pandas, numpy, and Python built-in types.
- tidymut.utils.type_converter.convert_data_types(dataset: pd.DataFrame, type_conversions: Dict[str, str | Type | np.dtype], handle_errors: str = 'coerce', optimize_memory: bool = True) pd.DataFrame [source]
Convert data types for specified columns with enhanced type support.
- Parameters:
dataset (pd.DataFrame) – Input dataset with columns to be converted
type_conversions (Dict[str, Union[str, Type, np.dtype]]) – Type conversion mapping in format {column_name: target_type} Supported formats: - String types: ‘float’, ‘int’, ‘str’, ‘category’, ‘bool’, ‘datetime’ - Numpy types: np.float32, np.float64, np.int32, np.int64, etc. - Pandas types: ‘Int64’, ‘Float64’, ‘string’, ‘boolean’ - Python types: float, int, str, bool
handle_errors (str, default='coerce') – Error handling strategy: ‘raise’, ‘coerce’, or ‘ignore’
optimize_memory (bool, default=True) – Whether to automatically optimize memory usage
- Returns:
Dataset with converted data types
- Return type:
pd.DataFrame
- tidymut.utils.type_converter.convert_data_types_batch(dataset: pd.DataFrame, type_conversions: Dict[str, str | Type | np.dtype], handle_errors: str = 'coerce', optimize_memory: bool = True, chunk_size: int = 10000) pd.DataFrame [source]
Batch conversion version for large datasets.
- Parameters:
dataset (pd.DataFrame) – Input dataset
type_conversions (Dict[str, Union[str, Type, np.dtype]]) – Type conversion mapping
handle_errors (str, default='coerce') – Error handling strategy
optimize_memory (bool, default=True) – Whether to optimize memory usage
chunk_size (int, default=10000) – Chunk size for processing large datasets
- Returns:
Dataset with converted data types
- Return type:
pd.DataFrame
- tidymut.utils.type_converter.convert_to_boolean(series: Series, errors: str) Series [source]
Intelligent boolean conversion handling various boolean representations.
- Parameters:
series (pd.Series) – Input series to convert
errors (str) – Error handling strategy
- Returns:
Boolean series
- Return type:
pd.Series
- tidymut.utils.type_converter.get_conversion_function(target_type: str | Type | np.dtype, optimize_memory: bool) Tuple[str, Callable] [source]
Get appropriate conversion function for target type.
- Parameters:
target_type (Union[str, Type, np.dtype]) – Target data type
optimize_memory (bool) – Whether to use memory-optimized types
- Returns:
Type name and conversion function
- Return type:
Tuple[str, Callable]
- tidymut.utils.type_converter.normalize_type_conversions(type_conversions: Dict[str, str | Type | np.dtype], optimize_memory: bool) Dict[str, Tuple[str, Callable]] [source]
Normalize type conversion mapping to standardized format.
- Returns:
Mapping of {column: (type_name, conversion_function)}
- Return type:
Dict[str, Tuple[str, Callable]]