π§© Sub-Pipeline Composition
What you'll learn
How to nest entire pipelines as steps in other pipelines for modular, reusable, and testable workflow design.
Sub-pipelines let you compose complex workflows from smaller, tested units β like function composition, but for entire DAGs. Each child pipeline runs in its own execution context while the parent manages data flow between them.
Why Sub-Pipelines? π€
| Monolithic Pipelines | Sub-Pipeline Composition |
|---|---|
| One giant pipeline | Modular, reusable units |
| Hard to test in isolation | Test each child independently |
| Code duplication | Share preprocessing across projects |
| Everything visible | Encapsulate implementation details |
Basic Usage π
Input / Output Mapping π₯
Map parent output names to child input names (and vice versa) when they differ:
How Mapping Works
graph LR
A["Parent: raw_data"] -- "input_mapping" --> B["Child: input_df"]
B --> C["Child Pipeline<br/>clean β normalize"]
C --> D["Child: normalized"]
D -- "output_mapping" --> E["Parent: clean_data"]
input_mapping: Translates parent output names into the child pipeline's expected input namesoutput_mapping: Translates child output names back into the parent's namespace- Unmapped keys are passed through as-is
Programmatic API π οΈ
Use SubPipelineStep directly for full control:
Or use the convenience function:
Real-World Examples π
Shared Preprocessing Across Projects
Multi-Stage ML Pipeline
SubPipelineStep API Reference
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
sub_pipeline |
Pipeline |
required | The child pipeline to wrap |
name |
str \| None |
"sub:{pipeline.name}" |
Step name in parent |
inputs |
list[str] |
[] |
Input names from parent |
outputs |
list[str] |
[] |
Output names exposed to parent |
input_mapping |
dict[str, str] |
{} |
Parent β child name mapping |
output_mapping |
dict[str, str] |
{} |
Child β parent name mapping |
Best Practices π‘
Design for reusability
Build child pipelines as standalone modules with their own tests. Import and compose them in parent pipelines.
Naming convention
Use descriptive names: parent.add_sub_pipeline(preprocess, name="feature_engineering") makes logs easier to read.
Context isolation
Child pipelines run with auto_start_ui=False. They share the parent's storage but have their own execution context.