Context & Parameters 🧠

flowyml's context system eliminates configuration hell by providing automatic parameter injection across pipeline steps.

[!NOTE] What you'll learn: How to manage configuration without hardcoding, enabling the same pipeline to run in dev/staging/prod

Key insight: Context separates what your pipeline does from how it's configured. Change parameters, not code.

Why Context Matters

Without context, ML pipelines suffer from: - Hardcoded parameters: learning_rate = 0.001 buried in code - Environment coupling: Different code for dev vs. prod - Configuration sprawl: Parameters scattered across files - Manual wiring: Pass every parameter through every function

With flowyml context, you get: - Automatic injection: Parameters flow to steps that need them - Environment flexibility: Same code, different configs - Centralized configuration: All parameters in one place - Type safety: Type hints validate parameters automatically

[!TIP] The killer feature: Run the same pipeline with different configs just by swapping context. No code changes to go from dev (small dataset, CPU) to prod (full dataset, GPU).

The Context Object

The Context object serves as a container for: 1. Global Parameters: Hyperparameters, configuration settings 2. Environment Variables: Paths, endpoints, credentials 3. Runtime Settings: Batch sizes, resource requirements 4. Domain Logic: Business rules, thresholds

Creating a Context

You can create a context with any number of keyword arguments:

from flowyml import context

# Define parameters
ctx = context(
    learning_rate=0.001,
    batch_size=64,
    model_type="resnet50",
    random_seed=42,
    data_path="./data/train.csv"
)

Using Context with Pipelines

from flowyml import Pipeline, context

ctx = context(learning_rate=0.01, epochs=100)

# Pass context to pipeline
pipeline = Pipeline("training_pipeline", context=ctx)

Automatic Injection 💉

The most powerful feature of flowyml's context is automatic parameter injection. If a step function argument matches a key in the context, flowyml will automatically inject the value when the step is executed.

Example

from flowyml import Pipeline, step, context

# 1. Define Context
ctx = context(
    learning_rate=0.01,
    epochs=10,
    batch_size=32
)

# 2. Define Steps with matching parameter names
@step(outputs=["model"])
def train_model(data, learning_rate: float, epochs: int):
    # 'learning_rate' and 'epochs' are automatically injected from context!
    print(f"Training with lr={learning_rate}, epochs={epochs}")
    model = train(data, lr=learning_rate, epochs=epochs)
    return model

@step(inputs=["model"], outputs=["metrics"])
def evaluate(model, batch_size: int):
    # 'batch_size' automatically injected!
    return evaluate_model(model, batch_size=batch_size)

# 3. Create Pipeline with Context
pipeline = Pipeline("ml_pipeline", context=ctx)
pipeline.add_step(train_model)
pipeline.add_step(evaluate)

# 4. Run - parameters automatically injected!
result = pipeline.run()

How It Works

Parameter Matching: flowyml inspects each step function's signature
Context Lookup: For each parameter, it checks if a matching key exists in the context
Automatic Injection: If found, the value is injected when calling the step
Type Validation: Type hints are used to validate injected values

# This step signature:
def train(data, learning_rate: float, epochs: int):
    ...

# With this context:
ctx = context(learning_rate=0.01, epochs=100)

# Results in this call:
train(data=previous_output, learning_rate=0.01, epochs=100)

Parameter Overrides 🔄

You can override context parameters at runtime:

# Original context
ctx = context(learning_rate=0.01, epochs=10)
pipeline = Pipeline("training", context=ctx)

# Override for this specific run
result = pipeline.run(context={"epochs": 20, "learning_rate": 0.05})
# Now uses epochs=20 and learning_rate=0.05

Use Cases for Overrides

Experimentation: Try different hyperparameters quickly
Production vs Development: Different settings for different environments
A/B Testing: Run same pipeline with variations

# Compare different learning rates
for lr in [0.001, 0.01, 0.1]:
    result = pipeline.run(context={"learning_rate": lr})
    print(f"LR={lr}: accuracy={result.outputs['metrics'].accuracy}")

Context Updates

You can also update the context dynamically:

ctx = context(epochs=10)

# Update context
ctx.update({"learning_rate": 0.01, "batch_size": 32})

# Now context has all three parameters

Accessing Context Data 🔍

Individual Parameters

Most commonly, you just declare parameters in your step function:

@step
def my_step(param1: str, param2: int):
    # param1 and param2 injected from context
    print(f"{param1}, {param2}")

Full Context Access

If you need access to the entire context object:

@step
def inspect_context(context):
    # Access all parameters
    print(f"All params: {context.to_dict()}")

    # Check if parameter exists
    if "optional_param" in context:
        use_param(context["optional_param"])

Context Properties

The Context object provides useful methods:

ctx = context(a=1, b=2, c=3)

# Convert to dictionary
params_dict = ctx.to_dict()  # {"a": 1, "b": 2, "c": 3}

# Iterate over parameters
for key, value in ctx.items():
    print(f"{key}={value}")

# Get keys
keys = list(ctx.keys())  # ["a", "b", "c"]

# Check membership
if "a" in ctx:
    print(ctx["a"])

Mixing Input Data and Context Parameters

Steps can receive both pipeline data (from previous steps) and context parameters:

@step(outputs=["data"])
def load_data(file_path: str):
    # file_path from context
    return pd.read_csv(file_path)

@step(inputs=["data"], outputs=["processed"])
def process(data, threshold: float, normalize: bool):
    # 'data' from previous step (load_data output)
    # 'threshold' and 'normalize' from context
    if normalize:
        data = normalize_data(data)
    return filter_by_threshold(data, threshold)

ctx = context(
    file_path="data/train.csv",
    threshold=0.5,
    normalize=True
)

Type Hints and Validation 🎯

Type hints serve two purposes:

Documentation: Clarify expected types
Validation: Help flowyml match parameters correctly

from typing import List, Dict, Optional

@step
def train(
    data: List[float],           # From previous step
    learning_rate: float,        # From context - must be float
    layers: List[int],           # From context - must be list of ints
    config: Dict[str, any],      # From context - must be dict
    optional_param: Optional[str] = None  # Optional context parameter
):
    ...

ctx = context(
    learning_rate=0.01,          # ✓ Matches type
    layers=[128, 64, 32],        # ✓ Matches type
    config={"dropout": 0.5},     # ✓ Matches type
    # optional_param not provided - uses default None
)

Advanced Patterns

Environment-Specific Contexts

import os

def get_context():
    env = os.getenv("ENV", "development")

    if env == "production":
        return context(
            data_path="s3://prod-bucket/data",
            batch_size=256,
            use_gpu=True
        )
    else:
        return context(
            data_path="./local_data",
            batch_size=32,
            use_gpu=False
        )

pipeline = Pipeline("adaptive", context=get_context())

Nested Configuration

ctx = context(
    model_config={
        "type": "transformer",
        "hidden_size": 768,
        "num_layers": 12
    },
    training_config={
        "learning_rate": 0.001,
        "warmup_steps": 1000
    }
)

@step
def train(model_config: dict, training_config: dict):
    # Access nested configuration
    model = create_model(**model_config)
    optimizer = create_optimizer(**training_config)
    ...

Context Inheritance

# Base context
base_ctx = context(
    random_seed=42,
    verbose=True
)

# Extend for specific use case
training_ctx = context(
    **base_ctx.to_dict(),
    learning_rate=0.01,
    epochs=100
)

Best Practices 🌟

1. Use Descriptive Parameter Names

# ✅ Good - clear and specific
ctx = context(
    learning_rate=0.01,
    max_epochs=100,
    early_stopping_patience=10
)

# ❌ Bad - unclear abbreviations
ctx = context(
    lr=0.01,
    e=100,
    p=10
)

2. Always Use Type Hints

# ✅ Good - types make injection reliable
@step
def process(data, threshold: float, iterations: int):
    ...

# ⚠️ Less ideal - no type information
@step
def process(data, threshold, iterations):
    ...

3. Provide Sensible Defaults

# ✅ Good - works with or without context values
@step
def train(
    data,
    learning_rate: float = 0.001,
    epochs: int = 10,
    verbose: bool = False
):
    ...

# Can run without providing these in context

# ✅ Good - organized into logical groups
ctx = context(
    # Data parameters
    data_path="./data",
    validation_split=0.2,

    # Model parameters
    model_type="resnet50",
    pretrained=True,

    # Training parameters
    learning_rate=0.001,
    batch_size=32,
    epochs=100
)

5. Document Your Context Requirements

def create_training_pipeline(context):
    """Create a training pipeline.

    Required context parameters:
        - data_path (str): Path to training data
        - learning_rate (float): Learning rate for optimizer
        - epochs (int): Number of training epochs

    Optional context parameters:
        - batch_size (int): Batch size, default 32
        - random_seed (int): Random seed, default 42
    """
    pipeline = Pipeline("training", context=context)
    # ...
    return pipeline

Debugging Context Issues 🔧

Check Context Contents

ctx = context(a=1, b=2, c=3)

# Print all parameters
print(ctx.to_dict())

# Check during pipeline execution
pipeline = Pipeline("debug", context=ctx)
result = pipeline.run(debug=True)  # Shows parameter injection

Missing Parameter Errors

If a step requires a parameter not in the context or previous outputs:

@step
def needs_param(required_param: str):
    ...

# If context doesn't have 'required_param', execution fails
pipeline.run()  # Error: Missing required parameters: ['required_param']

Next Steps 📚

Pipelines: Learn how to build workflows
Steps: Master step configuration
Configuration: External configuration files
Caching: Understand caching with context