tidymut.core.pipeline module
- class tidymut.core.pipeline.Pipeline(data: Any = None, name: str | None = None, logging_level: str = 'INFO')[source]
Bases:
object
Pipeline for processing data with pandas-style method chaining
- add_delayed_step(func: Callable, index: int | None = None, *args, **kwargs) Pipeline [source]
Add a delayed step before a specific position in the delayed execution queue.
Performs a similar action to the list.insert() method.
- Parameters:
func (Callable) – Function to add as delayed step
index (Optional[int]) – Position to insert the step. If None, appends to the end. Supports negative indexing.
*args – Arguments to pass to the function
**kwargs – Arguments to pass to the function
- Returns:
Self for method chaining
- Return type:
Examples
>>> # Add step at the beginning >>> pipeline.add_delayed_step(func1, 0)
>>> # Add step at the end (same as delayed_then) >>> pipeline.add_delayed_step(func2)
>>> # Insert step at position 2 >>> pipeline.add_delayed_step(func3, 2)
>>> # Insert step before the last one >>> pipeline.add_delayed_step(func4, -1)
- apply(func: Callable, *args, **kwargs) Pipeline [source]
Apply function and return new Pipeline (functional style)
- property artifacts: Dict[str, Any]
Always return the artifacts dictionary.
This provides direct access to all stored artifacts from pipeline steps.
- property data: Any
Always return the actual data, never PipelineOutput.
This ensures consistent user experience - pipeline.data can always be used with methods like .copy(), .append(), etc.
- delayed_then(func: Callable, *args, **kwargs) Pipeline [source]
Add a function to the delayed execution queue without running it immediately
- execute(steps: int | List[int] | None = None) Pipeline [source]
Execute delayed steps.
- Parameters:
steps (Optional[Union[int, List[int]]]) – Which delayed steps to execute: - None: execute all delayed steps - int: execute the first N delayed steps - List[int]: execute specific delayed steps by index
- Returns:
Self for method chaining
- Return type:
- get_data() Any [source]
Get current data (same as .data property).
Kept for backward compatibility.
- get_step_result(step_index: int | str) Any [source]
Get result from a specific step by index or name
- property has_pending_steps: bool
Check if there are delayed steps waiting to be executed
- classmethod load(filepath: str, format: str = 'pickle', name: str | None = None) Pipeline [source]
Load data from file and create new pipeline
- classmethod load_structured_data(filepath: str, format: str = 'pickle', name: str | None = None) Pipeline [source]
Load structured data from file and create new pipeline
- peek(func: Callable | None = None, prefix: str = '') Pipeline [source]
Inspect data without modifying it (for debugging)
- remove_delayed_step(index_or_name: int | str) Pipeline [source]
Remove a delayed step at the specified index.
- Parameters:
index (int) – Index of the delayed step to remove
- Returns:
Self for method chaining
- Return type:
Pepline
- Raises:
ValueError – If no delayed step is found with the specified index or name
- save_structured_data(filepath: str, format: str = 'pickle') Pipeline [source]
Save structured data (data + artifacts) to file
- store(name: str, extractor: Callable | None = None) Pipeline [source]
Store current data or extracted value as artifact
- property structured_data: PipelineOutput
Return PipelineOutput object with both data and artifacts.
Use this when you need the complete pipeline state for serialization, passing to other systems, or when working with structured data flows.
- then(func: Callable, *args, **kwargs) Pipeline [source]
Apply a function to the current data (pandas.pipe style)
- transform(transformer: Callable, *args, **kwargs) Pipeline [source]
Alias of then, used to define format transformations.
- tidymut.core.pipeline.create_pipeline(data: Any, name: str | None = None, **kwargs) Pipeline [source]
Create a new pipeline with initial data
- tidymut.core.pipeline.multiout_step(**outputs: str)[source]
Decorator for multi-output pipeline functions.
Use this for functions that return multiple values where you want to name and access the outputs separately.
- Parameters:
**outputs (str) – Named outputs. Use ‘main’ to specify which output is the main data flow. If ‘main’ is not specified, the first return value is treated as main.
Examples
>>> # Returns 3 values: main, stats, plot >>> @multiout_step(stats="statistics", plot="visualization") ... def analyze_data(data): ... ... ... return processed_data, stats_dict, plot_object
>>> # Returns 3 values with explicit main designation >>> @multiout_step(main="result", error="error_info", stats="statistics") ... def process_with_metadata(data): ... ... ... return result, error_info, stats
Note
With this decorator, side outputs are returned as a dictionary.
- tidymut.core.pipeline.pipeline_step(name: str | Callable[..., Any] | None = None)[source]
Decorator for single-output pipeline functions.
Use this for functions that return a single value (including tuples as single values). For multiple outputs, use @multiout_step instead.
- Parameters:
name (Optional[str] or Callable) – Custom name for the step. If None, uses function name. When used as @pipeline_step (without parentheses), this will be the function.
Examples
>>> @pipeline_step ... def process(data): ... return processed_data # Single output
>>> @pipeline_step("process_data") ... def process(data): ... return processed_data # Single output
>>> @pipeline_step() ... def get_coordinates(): ... return (10, 20) # Single tuple output