tidymut.core.pipeline module

class tidymut.core.pipeline.Pipeline(data: Any = None, name: str | None = None, logging_level: str = 'INFO')[source]

Bases: object

Pipeline for processing data with pandas-style method chaining

add_delayed_step(func: Callable, index: int | None = None, *args, **kwargs) Pipeline[source]

Add a delayed step before a specific position in the delayed execution queue.

Performs a similar action to the list.insert() method.

Parameters:
  • func (Callable) – Function to add as delayed step

  • index (Optional[int]) – Position to insert the step. If None, appends to the end. Supports negative indexing.

  • *args – Arguments to pass to the function

  • **kwargs – Arguments to pass to the function

Returns:

Self for method chaining

Return type:

Pipeline

Examples

>>> # Add step at the beginning
>>> pipeline.add_delayed_step(func1, 0)
>>> # Add step at the end (same as delayed_then)
>>> pipeline.add_delayed_step(func2)
>>> # Insert step at position 2
>>> pipeline.add_delayed_step(func3, 2)
>>> # Insert step before the last one
>>> pipeline.add_delayed_step(func4, -1)
apply(func: Callable, *args, **kwargs) Pipeline[source]

Apply function and return new Pipeline (functional style)

property artifacts: Dict[str, Any]

Always return the artifacts dictionary.

This provides direct access to all stored artifacts from pipeline steps.

assign(**kwargs) Pipeline[source]

Add attributes or computed values to data

copy() Pipeline[source]

Create a deep copy of this pipeline

property data: Any

Always return the actual data, never PipelineOutput.

This ensures consistent user experience - pipeline.data can always be used with methods like .copy(), .append(), etc.

delayed_then(func: Callable, *args, **kwargs) Pipeline[source]

Add a function to the delayed execution queue without running it immediately

execute(steps: int | List[int] | None = None) Pipeline[source]

Execute delayed steps.

Parameters:

steps (Optional[Union[int, List[int]]]) – Which delayed steps to execute: - None: execute all delayed steps - int: execute the first N delayed steps - List[int]: execute specific delayed steps by index

Returns:

Self for method chaining

Return type:

Pipeline

filter(condition: Callable) Pipeline[source]

Filter data based on condition

get_all_artifacts() Dict[str, Any][source]

Get all stored artifacts

get_artifact(name: str) Any[source]

Get a specific artifact by name

get_data() Any[source]

Get current data (same as .data property).

Kept for backward compatibility.

get_delayed_steps_info() List[Dict[str, Any]][source]

Get information about delayed steps

get_execution_summary() Dict[str, Any][source]

Get summary of pipeline execution

get_step_result(step_index: int | str) Any[source]

Get result from a specific step by index or name

property has_pending_steps: bool

Check if there are delayed steps waiting to be executed

classmethod load(filepath: str, format: str = 'pickle', name: str | None = None) Pipeline[source]

Load data from file and create new pipeline

classmethod load_structured_data(filepath: str, format: str = 'pickle', name: str | None = None) Pipeline[source]

Load structured data from file and create new pipeline

peek(func: Callable | None = None, prefix: str = '') Pipeline[source]

Inspect data without modifying it (for debugging)

remove_delayed_step(index_or_name: int | str) Pipeline[source]

Remove a delayed step at the specified index.

Parameters:

index (int) – Index of the delayed step to remove

Returns:

Self for method chaining

Return type:

Pepline

Raises:

ValueError – If no delayed step is found with the specified index or name

save(filepath: str, format: str = 'pickle') Pipeline[source]

Save current data to file

save_artifacts(filepath: str, format: str = 'pickle') Pipeline[source]

Save all artifacts to file

save_structured_data(filepath: str, format: str = 'pickle') Pipeline[source]

Save structured data (data + artifacts) to file

store(name: str, extractor: Callable | None = None) Pipeline[source]

Store current data or extracted value as artifact

property structured_data: PipelineOutput

Return PipelineOutput object with both data and artifacts.

Use this when you need the complete pipeline state for serialization, passing to other systems, or when working with structured data flows.

then(func: Callable, *args, **kwargs) Pipeline[source]

Apply a function to the current data (pandas.pipe style)

transform(transformer: Callable, *args, **kwargs) Pipeline[source]

Alias of then, used to define format transformations.

validate(validator: Callable, error_msg: str = 'Validation failed') Pipeline[source]

Validate data and raise error if invalid

visualize_pipeline() str[source]

Generate a text visualization of the pipeline

tidymut.core.pipeline.create_pipeline(data: Any, name: str | None = None, **kwargs) Pipeline[source]

Create a new pipeline with initial data

tidymut.core.pipeline.multiout_step(**outputs: str)[source]

Decorator for multi-output pipeline functions.

Use this for functions that return multiple values where you want to name and access the outputs separately.

Parameters:

**outputs (str) – Named outputs. Use ‘main’ to specify which output is the main data flow. If ‘main’ is not specified, the first return value is treated as main.

Examples

>>> # Returns 3 values: main, stats, plot
>>> @multiout_step(stats="statistics", plot="visualization")
... def analyze_data(data):
...     ...
...     return processed_data, stats_dict, plot_object
>>> # Returns 3 values with explicit main designation
>>> @multiout_step(main="result", error="error_info", stats="statistics")
... def process_with_metadata(data):
...     ...
...     return result, error_info, stats

Note

With this decorator, side outputs are returned as a dictionary.

tidymut.core.pipeline.pipeline_step(name: str | Callable[..., Any] | None = None)[source]

Decorator for single-output pipeline functions.

Use this for functions that return a single value (including tuples as single values). For multiple outputs, use @multiout_step instead.

Parameters:

name (Optional[str] or Callable) – Custom name for the step. If None, uses function name. When used as @pipeline_step (without parentheses), this will be the function.

Examples

>>> @pipeline_step
... def process(data):
...     return processed_data  # Single output
>>> @pipeline_step("process_data")
... def process(data):
...     return processed_data  # Single output
>>> @pipeline_step()
... def get_coordinates():
...     return (10, 20)  # Single tuple output