Experiment Tracking & Conditional Deployment ๐งช
Build a production-grade experiment pipeline that trains a model, tracks all metrics live, and only deploys if the model exceeds a quality threshold. This is the pattern used in real MLOps workflows to prevent bad models from reaching production.
Time: ~15 minutes Level: Intermediate Prerequisites: Completed the FlowyML Quick Start
What We'll Build ๐ฏ
An experiment pipeline that:
- ๐ฅ Loads training data
- ๐ Trains a model with live metric capture via
FlowymlKerasCallback - ๐ Evaluates on test data
- ๐ฆ Conditionally deploys only if accuracy โฅ 85%
- ๐พ Saves + exports the model (only if threshold met)
load_data โ train_model โ evaluate_model
โ
[if accuracy โฅ 0.85]
โ
export_model โ save_model
Option 1: Using the Pre-Built Template ๐ญ
MLPotion provides create_keras_experiment_pipeline that does exactly this:
# experiment_template.py
import keras
from flowyml.core.context import Context
from mlpotion.integrations.flowyml.keras import create_keras_experiment_pipeline
def create_model() -> keras.Model:
model = keras.Sequential([
keras.layers.Dense(128, activation="relu", input_shape=(4,)),
keras.layers.Dense(64, activation="relu"),
keras.layers.Dense(1, activation="sigmoid"), # Binary classification
])
model.compile(
optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"],
)
return model
def main():
ctx = Context(
# Data
file_path="data/train.csv",
label_name="is_fraud",
batch_size=64,
# Training
epochs=50,
learning_rate=0.001,
experiment_name="fraud-detection-v3",
project="fraud-detection",
# Export (only used if threshold met)
export_path="models/production/fraud_model/",
save_path="models/checkpoints/fraud_model.keras",
)
pipeline = create_keras_experiment_pipeline(
name="fraud_experiment",
context=ctx,
project_name="fraud-detection",
deploy_threshold=0.85, # Only deploy if accuracy โฅ 85%
threshold_metric="accuracy", # Which metric to check
)
result = pipeline.run()
print("โ
Experiment complete!")
if __name__ == "__main__":
main()
What Happens Under the Hood
The template creates a Pipeline with:
enable_experiment_tracking=Trueโ all metrics are trackedenable_checkpointing=Trueโ pipeline can resume on failure- A
FlowymlKerasCallbackauto-attached to the training step - An
Ifconditional flow that gates the export steps
Option 2: Build It Yourself ๐๏ธ
For full control, build the conditional pipeline manually:
# experiment_manual.py
import keras
from flowyml.core.context import Context
from flowyml.core.pipeline import Pipeline
from flowyml.core.conditional import If
from mlpotion.integrations.flowyml.keras import (
load_data,
train_model,
evaluate_model,
export_model,
save_model,
)
def main():
ctx = Context(
file_path="data/train.csv",
label_name="is_fraud",
batch_size=64,
epochs=50,
experiment_name="fraud-detection-manual",
export_path="models/production/",
save_path="models/checkpoints/model.keras",
)
pipeline = Pipeline(
name="fraud_experiment_manual",
context=ctx,
enable_cache=False, # Don't cache โ we want fresh runs
enable_experiment_tracking=True, # Track all metrics
enable_checkpointing=True, # Resume on failure
)
# Core training DAG
pipeline.add_step(load_data)
pipeline.add_step(train_model)
pipeline.add_step(evaluate_model)
# Conditional deployment gate
deploy_gate = If(
condition=lambda metrics: (
metrics.get_metric("accuracy", 0) >= 0.85
if hasattr(metrics, "get_metric")
else metrics.get("accuracy", 0) >= 0.85
),
then_steps=[export_model, save_model],
name="deploy_if_accuracy_above_0.85",
)
pipeline.control_flows.append(deploy_gate)
result = pipeline.run()
# Check what happened
print("โ
Experiment complete!")
# The export/save steps only ran if accuracy โฅ 85%
if __name__ == "__main__":
main()
Understanding FlowymlKerasCallback ๐
When you provide experiment_name to train_model, a FlowymlKerasCallback
is automatically attached to your training loop. It captures:
| What | How |
|---|---|
| All epoch metrics | loss, accuracy, val_loss, val_accuracy, etc. |
| Per-batch metrics | If enabled, captures granular training progress |
| Model artifact | Optionally logs the model itself after training |
| Training metadata | Epochs completed, learning rate, batch size |
# The callback is auto-created inside train_model:
from flowyml.integrations.keras import FlowymlKerasCallback
callback = FlowymlKerasCallback(
experiment_name="fraud-detection-v3",
project="fraud-detection",
log_model=True,
)
# This is attached to the Keras model.fit() call automatically
You can also add your own callbacks alongside the auto-attached one:
from mlpotion.integrations.flowyml.keras import train_model
model_asset, metrics_asset = train_model(
model=my_model,
data=dataset,
epochs=50,
experiment_name="v3",
callbacks=[
keras.callbacks.EarlyStopping(patience=10),
keras.callbacks.ReduceLROnPlateau(factor=0.5),
],
)
# FlowymlKerasCallback is ADDED to your list, not replaced
Multi-Metric Conditional Gates ๐ฆ
You can create more complex conditions that check multiple metrics:
from flowyml.core.conditional import If
# Gate on multiple metrics
deploy_gate = If(
condition=lambda metrics: (
metrics.get_metric("accuracy", 0) >= 0.85
and metrics.get_metric("loss", float("inf")) < 0.3
and metrics.get_metric("precision", 0) >= 0.80
),
then_steps=[export_model, save_model],
name="deploy_if_quality_sufficient",
)
Cross-Framework Experiments ๐
The same experiment pattern works across all three frameworks. Just swap the imports:
from mlpotion.integrations.flowyml.keras import create_keras_experiment_pipeline
pipeline = create_keras_experiment_pipeline(
context=ctx,
deploy_threshold=0.85,
threshold_metric="accuracy",
)
from mlpotion.integrations.flowyml.pytorch import create_pytorch_experiment_pipeline
pipeline = create_pytorch_experiment_pipeline(
context=ctx,
deploy_threshold=0.85,
threshold_metric="accuracy",
)
from mlpotion.integrations.flowyml.tensorflow import create_tf_experiment_pipeline
pipeline = create_tf_experiment_pipeline(
context=ctx,
deploy_threshold=0.85,
threshold_metric="accuracy",
)
Experiment Comparison ๐
After running multiple experiments, use FlowyML's built-in tools to compare:
from flowyml.core.experiment import ExperimentTracker
tracker = ExperimentTracker(project="fraud-detection")
# List all experiments
experiments = tracker.list_experiments()
for exp in experiments:
print(f" {exp.name}: accuracy={exp.metrics.get('accuracy', '?')}")
# Compare two experiments
comparison = tracker.compare(
experiment_a="fraud-detection-v2",
experiment_b="fraud-detection-v3",
)
print(comparison.summary())
Best Practices ๐ก
1. Always Name Your Experiments
Use descriptive, versioned names that you can search later:
ctx = Context(
experiment_name="fraud-detection-v3-lr0001-epochs50",
project="fraud-detection",
)
2. Use Checkpointing for Long Training Runs
pipeline = Pipeline(
name="long_training",
context=ctx,
enable_checkpointing=True, # Resume from last checkpoint
)
3. Set Conservative Initial Thresholds
Start with a lower threshold and tighten it as your model improves:
# Start conservatively
pipeline = create_keras_experiment_pipeline(
context=ctx,
deploy_threshold=0.70, # Low threshold initially
)
# After baseline is established
pipeline = create_keras_experiment_pipeline(
context=ctx,
deploy_threshold=0.90, # Tighter threshold for v2
)
4. Combine with Scheduling
Auto-retrain and conditionally deploy on a schedule:
from mlpotion.integrations.flowyml.keras import create_keras_scheduled_pipeline
info = create_keras_scheduled_pipeline(
context=ctx,
schedule="0 2 * * 0", # Weekly Sunday 2 AM
)
# The scheduled pipeline will auto-deploy only good models
# if you add the conditional gate
What You Learned ๐
- โ How to use the experiment pipeline template
- โ How to build manual conditional deployment gates
- โ
How
FlowymlKerasCallbackauto-captures training metrics - โ How to create multi-metric quality gates
- โ How to compare experiments across runs
- โ Best practices for production experiment workflows
Next Steps ๐
- Scheduled Retraining โ โ Cron-based pipelines
- Custom Pipelines โ โ Build your own step combinations
- FlowyML Integration Guide โ โ Full API reference
Your models now deploy themselves โ only when they're good enough! ๐ฆ