tidymut.utils.type_converter module

Type conversion utilities for pandas DataFrames.

This module provides flexible and efficient data type conversion capabilities with support for pandas, numpy, and Python built-in types.

tidymut.utils.type_converter.convert_data_types(dataset: pd.DataFrame, type_conversions: Dict[str, str | Type | np.dtype], handle_errors: str = 'coerce', optimize_memory: bool = True) pd.DataFrame[source]

Convert data types for specified columns with enhanced type support.

Parameters:
  • dataset (pd.DataFrame) – Input dataset with columns to be converted

  • type_conversions (Dict[str, Union[str, Type, np.dtype]]) – Type conversion mapping in format {column_name: target_type} Supported formats: - String types: ‘float’, ‘int’, ‘str’, ‘category’, ‘bool’, ‘datetime’ - Numpy types: np.float32, np.float64, np.int32, np.int64, etc. - Pandas types: ‘Int64’, ‘Float64’, ‘string’, ‘boolean’ - Python types: float, int, str, bool

  • handle_errors (str, default='coerce') – Error handling strategy: ‘raise’, ‘coerce’, or ‘ignore’

  • optimize_memory (bool, default=True) – Whether to automatically optimize memory usage

Returns:

Dataset with converted data types

Return type:

pd.DataFrame

tidymut.utils.type_converter.convert_data_types_batch(dataset: pd.DataFrame, type_conversions: Dict[str, str | Type | np.dtype], handle_errors: str = 'coerce', optimize_memory: bool = True, chunk_size: int = 10000) pd.DataFrame[source]

Batch conversion version for large datasets.

Parameters:
  • dataset (pd.DataFrame) – Input dataset

  • type_conversions (Dict[str, Union[str, Type, np.dtype]]) – Type conversion mapping

  • handle_errors (str, default='coerce') – Error handling strategy

  • optimize_memory (bool, default=True) – Whether to optimize memory usage

  • chunk_size (int, default=10000) – Chunk size for processing large datasets

Returns:

Dataset with converted data types

Return type:

pd.DataFrame

tidymut.utils.type_converter.convert_to_boolean(series: Series, errors: str) Series[source]

Intelligent boolean conversion handling various boolean representations.

Parameters:
  • series (pd.Series) – Input series to convert

  • errors (str) – Error handling strategy

Returns:

Boolean series

Return type:

pd.Series

tidymut.utils.type_converter.get_conversion_function(target_type: str | Type | np.dtype, optimize_memory: bool) Tuple[str, Callable][source]

Get appropriate conversion function for target type.

Parameters:
  • target_type (Union[str, Type, np.dtype]) – Target data type

  • optimize_memory (bool) – Whether to use memory-optimized types

Returns:

Type name and conversion function

Return type:

Tuple[str, Callable]

tidymut.utils.type_converter.normalize_type_conversions(type_conversions: Dict[str, str | Type | np.dtype], optimize_memory: bool) Dict[str, Tuple[str, Callable]][source]

Normalize type conversion mapping to standardized format.

Returns:

Mapping of {column: (type_name, conversion_function)}

Return type:

Dict[str, Tuple[str, Callable]]