FlowyML Quick Start Tutorial 🚀
Build a complete artifact-centric ML pipeline in under 10 minutes using MLPotion + FlowyML. Unlike a basic standalone pipeline, this approach gives you typed artifacts, automatic metadata, DAG resolution, caching, and lineage tracking — all out of the box.
Time: ~10 minutes Level: Beginner Prerequisites: Python 3.10+
What We'll Build 🎯
A house-price regression pipeline that:
- 📥 Loads CSV data → returns a Dataset artifact
- 🎓 Trains a neural network → returns a Model + Metrics artifact
- 📊 Evaluates performance → returns a Metrics artifact
- 💾 Saves the model → returns a Model artifact with save path
All wired together as a FlowyML DAG with automatic dependency resolution.
Step 1: Install 📦
pip install mlpotion[flowyml,keras]
Step 2: Prepare Synthetic Data 📊
# create_data.py
import pandas as pd
import numpy as np
np.random.seed(42)
n = 10_000
data = pd.DataFrame({
"square_feet": np.random.randint(500, 5000, n),
"bedrooms": np.random.randint(1, 6, n),
"bathrooms": np.random.randint(1, 4, n),
"age_years": np.random.randint(0, 100, n),
})
data["price"] = (
data["square_feet"] * 200
+ data["bedrooms"] * 10_000
+ data["bathrooms"] * 15_000
- data["age_years"] * 500
+ np.random.randn(n) * 50_000
)
split = int(0.8 * len(data))
data[:split].to_csv("train.csv", index=False)
data[split:].to_csv("test.csv", index=False)
print(f"✅ Created {split} training + {n - split} test samples")
python create_data.py
Step 3: Build the Pipeline 🏗️
# flowyml_pipeline.py
import keras
from flowyml.core.context import Context
from mlpotion.integrations.flowyml.keras import create_keras_training_pipeline
def create_model(input_dim: int = 4) -> keras.Model:
"""Create a simple regression model."""
model = keras.Sequential([
keras.layers.Dense(128, activation="relu", input_shape=(input_dim,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(64, activation="relu"),
keras.layers.Dense(1),
])
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
return model
def main():
# 1. Define all hyperparameters in a single Context
ctx = Context(
file_path="train.csv",
label_name="price",
batch_size=32,
epochs=20,
learning_rate=0.001,
experiment_name="house-prices-v1",
)
# 2. Create the pipeline — one line!
pipeline = create_keras_training_pipeline(
name="house_price_training",
context=ctx,
project_name="house-prices",
)
# 3. Run it
print("🚀 Running FlowyML pipeline...")
result = pipeline.run()
# 4. Access the artifacts
print("\n✅ Pipeline complete!")
print(f" Steps executed: {len(result.steps)}")
# The pipeline steps return typed artifacts:
# - load_data → Dataset artifact
# - train_model → (Model, Metrics) artifacts
# - evaluate → Metrics artifact
if __name__ == "__main__":
main()
Step 4: Run the Pipeline 🏃
python flowyml_pipeline.py
Expected output:
🚀 Running FlowyML pipeline...
📦 Loaded dataset: 250 batches, batch_size=32, source=train.csv
🎯 Training complete: 20 epochs, metrics captured: ['loss', 'mae', ...]
📊 Evaluation: {'loss': 15234.56, 'mae': 98.21}
✅ Pipeline complete!
Steps executed: 3
Step 5: Understand What Happened 🔍
The DAG That Ran
load_data (outputs: dataset)
↓
train_model (inputs: dataset → outputs: model, training_metrics)
↓
evaluate_model (inputs: model, dataset → outputs: metrics)
FlowyML automatically resolved the dependency graph from the inputs/outputs
declarations on each step.
The Artifacts Created
| Step | Artifact | Type | Auto-Extracted Metadata |
|---|---|---|---|
load_data |
dataset |
Dataset |
source, batch_size, batches, label_name |
train_model |
model |
Model |
layers, parameters, optimizer info |
train_model |
training_metrics |
Metrics |
loss, mae, epochs, learning_rate |
evaluate_model |
metrics |
Metrics |
loss, mae (on eval data) |
Caching In Action
Run the pipeline again — notice that load_data is skipped because its
cache="code_hash" detects no code changes:
python flowyml_pipeline.py
# 📦 load_data — CACHED ✓
# 🎯 Training complete...
Step 6: Use Individual Steps Standalone 🧩
Every step can also be called directly — no pipeline required:
from mlpotion.integrations.flowyml.keras import load_data, evaluate_model
# Load data standalone
dataset = load_data(file_path="test.csv", batch_size=64, label_name="price")
print(type(dataset)) # <class 'flowyml.assets.dataset.Dataset'>
print(dataset.metadata.properties) # {'source': 'test.csv', ...}
# Evaluate standalone
metrics = evaluate_model(model=trained_model, data=dataset)
print(metrics.get_metric("mae")) # 102.34
What You Learned 🎓
- ✅ How to install MLPotion with FlowyML support
- ✅ How to configure a pipeline with
Context - ✅ How to use a pipeline template (
create_keras_training_pipeline) - ✅ How FlowyML auto-resolves the DAG from step I/O declarations
- ✅ How every step returns typed artifacts with metadata
- ✅ How caching skips unchanged steps automatically
- ✅ How to use individual steps standalone
Next Steps 🚀
- Custom Pipelines → — Compose your own step combinations
- Experiment Tracking → — Conditional deploy + experiment comparison
- Scheduled Retraining → — Cron-based periodic pipelines
- FlowyML Integration Guide → — Full reference docs
Congratulations — you've built your first artifact-centric ML pipeline! 🎉