FlowyML π
π¦ Why Artifact-Centric Changes Everything
Typical orchestrators (Airflow, ZenML, Prefect) are task-based β you wire arrows between steps manually. FlowyML is artifact-centric β steps declare what data they produce and consume, and the DAG is built automatically.
π§ͺ Beyond Tasks
Stop thinking about what to run. Think about what you produce. FlowyML builds the DAG from your inputs and outputs β no >> arrows, no .set_downstream().
ποΈ Unified Stacks
Same code runs locally, on Kubernetes, Vertex AI, or SageMaker. Swap infrastructure with a single YAML change β zero code rewrites.
π‘οΈ Production First
Built-in lineage tracking, intelligent caching, human-in-the-loop, distributed execution, and model leaderboards as first-class citizens.
π The Technical Paradigm Shift
graph LR
A[Raw Data] -->|Step A| B(Processed Dataset)
B -->|Step B| C(Trained Model)
C -->|Step C| D{Evaluation}
style B fill:#e1f5fe,stroke:#01579b
style C fill:#e1f5fe,stroke:#01579b
| Concept | Task-Based (Traditional) | Artifact-Centric (FlowyML) |
|---|---|---|
| Core focus | The order of operations ("The Verb") | The state of the data ("The Noun") |
| DAG Construction | Manual arrows (step1 >> step2) |
Auto-Inferred from input/output signatures |
| Data Handoff | Manual paths (s3://bucket/run_1/X.csv) |
Global Catalog resolution by name & version |
| Validation | Runtime failure ("File not found") | Compile-time type and lineage check |
| Reproducibility | Hope the script hasn't changed | Immutable Lineage (Parents β Child chain) |
β‘ The "Zen" Developer Experience
A complete, production-ready pipeline. Notice: no arrows (>>). The dependency between load_data and train_model is auto-inferred from the dataset artifact.
π How Artifacts Flow Through Infrastructure
FlowyML automatically routes artifacts to your configured infrastructure. Here's exactly how the YAML config connects to your code:
graph TB
subgraph "Your Code"
S1["@step(outputs=['dataset'])"] --> A1["Dataset Artifact"]
S2["@step(outputs=['model'])"] --> A2["Model Artifact"]
S3["@step(outputs=['metrics'])"] --> A3["Metrics Dict"]
end
subgraph "flowyml.yaml routing"
A1 -->|"artifact_store: gcs"| GCS["βοΈ GCS Bucket"]
A2 -->|"model_registry: vertex"| REG["π·οΈ Model Registry"]
A2 -->|"artifact_store: gcs"| GCS
A3 -->|"experiment_tracker: mlflow"| MLF["π¬ MLflow"]
end
π The Golden Rule
Without a stack configured β artifacts are stored locally in .flowyml/artifacts/.
With a stack configured β artifacts are automatically uploaded to the configured stores based on their type (Model, Dataset, Metrics).
βοΈ How YAML Maps to Code
π‘ What triggers an upload?
| Scenario | What Happens |
|---|---|
| No stack / local stack | Artifacts saved to ./artifacts/ on disk β no upload |
Stack with artifact_store: gcs |
All step outputs uploaded to GCS bucket |
Step returns Model type |
Saved to artifact store + registered in model registry (if configured) |
Step returns Metrics type |
Logged to experiment tracker (MLflow/W&B) + saved to artifact store |
artifact_routing rules defined |
Fine-grained control: deploy models, set paths, conditional deploy |
No model_registry configured |
Models saved to artifact store only β no registration |
π€ GenAI & LLM Evaluation
FlowyML natively supports LLM observability and evaluation:
- π΅οΈ LLM Tracing β Capture every LLM call, token count, latency, and cost with
@trace_llm - π― 17+ Built-in Scorers β Relevance, Faithfulness, Toxicity, and LLM-as-a-Judge evaluators
- ποΈ Judge Arena β A/B test evaluators against human labels in real-time
- π‘οΈ CI/CD Gates β Block bad models with quality assertions in your test suite
π― Why Teams Choose FlowyML
π‘ Zero Arrow Wiring
Define inputs and outputs on your steps. FlowyML auto-discovers the DAG β no .set_downstream() or >> plumbing.
π Same Code, Everywhere
Write once, deploy anywhere. Swap from local SQLite β GCS + Vertex AI β S3 + SageMaker with a single Stack config change.
π Built-In Observability
Real-time UI dashboard, lineage graphs, Gantt-chart timelines, and LLM token tracking β no extra plugins needed.
πΊοΈ Master the Platform
-
Getting Started --- Build your first pipeline in 5 minutes. Learn the basics of Steps and Pipelines.
-
Core Concepts --- Deep dive into the heart of FlowyML: Pipelines, Steps, Context, and Asset Lineage.
-
Artifact-Centric Philosophy --- Understand why focusing on Artifacts instead of Tasks changes everything for ML stability.
-
Advanced Features --- Master Caching, Parallelism, Conditional Execution, Step Grouping, and more.
-
User Guide --- Manage projects, deployments, versioning, scheduling, and observability dashboards.
-
:plug: Plugins & Stacks --- Cloud integrations, model registries, type-based routing, and stack management.
ποΈ Practical Examples
Explore real-world implementations in the examples/ directory:
- Complete Demo: A massive tour of versioning, projects, notifications, and drift detection.
- Pipeline Showcase: Complex branching, caching, and multi-asset management.
- UI Integration: Real-time monitoring with the web dashboard.
- Simple Pipeline: The absolute basics.
FlowyML is for those who are tired of plumbing.
Focus on the ML. We'll handle the flow.