Skip to content
flowyml Logo

FlowyML 🌊

The Artifact-Centric ML Pipeline Framework that lets you focus on Machine Learning, not infrastructure plumbing. Define your data β€” we build the DAG.
πŸ“¦ Artifact-Centric ⚑ Auto-DAG πŸ”¬ GenAI Eval ☁️ Multi-Cloud 🐳 Cloud-Native πŸ›‘οΈ Production-Ready
17+Eval Scorers
∞Plugin Ecosystem
0Arrows to Write
3Clouds Supported

πŸ“¦ Why Artifact-Centric Changes Everything

Typical orchestrators (Airflow, ZenML, Prefect) are task-based β€” you wire arrows between steps manually. FlowyML is artifact-centric β€” steps declare what data they produce and consume, and the DAG is built automatically.

πŸ§ͺ Beyond Tasks

Stop thinking about what to run. Think about what you produce. FlowyML builds the DAG from your inputs and outputs β€” no >> arrows, no .set_downstream().

πŸ—οΈ Unified Stacks

Same code runs locally, on Kubernetes, Vertex AI, or SageMaker. Swap infrastructure with a single YAML change β€” zero code rewrites.

πŸ›‘οΈ Production First

Built-in lineage tracking, intelligent caching, human-in-the-loop, distributed execution, and model leaderboards as first-class citizens.

πŸ” The Technical Paradigm Shift

graph LR
    A[Raw Data] -->|Step A| B(Processed Dataset)
    B -->|Step B| C(Trained Model)
    C -->|Step C| D{Evaluation}

    style B fill:#e1f5fe,stroke:#01579b
    style C fill:#e1f5fe,stroke:#01579b
Concept Task-Based (Traditional) Artifact-Centric (FlowyML)
Core focus The order of operations ("The Verb") The state of the data ("The Noun")
DAG Construction Manual arrows (step1 >> step2) Auto-Inferred from input/output signatures
Data Handoff Manual paths (s3://bucket/run_1/X.csv) Global Catalog resolution by name & version
Validation Runtime failure ("File not found") Compile-time type and lineage check
Reproducibility Hope the script hasn't changed Immutable Lineage (Parents β†’ Child chain)

⚑ The "Zen" Developer Experience

A complete, production-ready pipeline. Notice: no arrows (>>). The dependency between load_data and train_model is auto-inferred from the dataset artifact.

from flowyml import Pipeline, step, context, Model
from typing import List

@step(outputs=["dataset"])
def load_data() -> List[int]:
    """Produces a list of integers as an Artifact."""
    return [1, 2, 3, 4, 5]

@step(inputs=["dataset"], outputs=["model"])
def train_model(dataset: List[int], learning_rate: float) -> Model:
    """Consumes 'dataset' and 'learning_rate' (from context)."""
    # 'learning_rate' is automatically injected from the execution context!
    print(f"Training on {len(dataset)} items with lr={learning_rate}")
    return Model(data="weights", name="mnist_model", version="1.0.0")

# 1. Define Execution Context (Hyperparameters, etc.)
ctx = context(learning_rate=0.05)

# 2. Build Pipeline
pipeline = Pipeline("quickstart", context=ctx)
pipeline.add_step(load_data).add_step(train_model)

# 3. Run (Auto-Discovered dependencies)
pipeline.run()

πŸ”„ How Artifacts Flow Through Infrastructure

FlowyML automatically routes artifacts to your configured infrastructure. Here's exactly how the YAML config connects to your code:

graph TB
    subgraph "Your Code"
        S1["@step(outputs=['dataset'])"] --> A1["Dataset Artifact"]
        S2["@step(outputs=['model'])"] --> A2["Model Artifact"]
        S3["@step(outputs=['metrics'])"] --> A3["Metrics Dict"]
    end

    subgraph "flowyml.yaml routing"
        A1 -->|"artifact_store: gcs"| GCS["☁️ GCS Bucket"]
        A2 -->|"model_registry: vertex"| REG["🏷️ Model Registry"]
        A2 -->|"artifact_store: gcs"| GCS
        A3 -->|"experiment_tracker: mlflow"| MLF["πŸ”¬ MLflow"]
    end

πŸ“Œ The Golden Rule

Without a stack configured β†’ artifacts are stored locally in .flowyml/artifacts/. With a stack configured β†’ artifacts are automatically uploaded to the configured stores based on their type (Model, Dataset, Metrics).

βš™οΈ How YAML Maps to Code

# flowyml.yaml β€” this is what controls WHERE artifacts go
plugins:
  experiment_tracker:           # ← Metrics & Parameters go here
    type: mlflow
    tracking_uri: http://localhost:5000

  artifact_store:               # ← Datasets & general artifacts go here
    type: gcs
    bucket: my-ml-artifacts
    prefix: experiments/

  model_registry:               # ← Models get registered here (if enabled)
    type: vertex_model_registry

  orchestrator:                 # ← WHERE steps execute (local, cloud, K8s)
    type: vertex_ai
    project: my-gcp-project
from flowyml import step
from flowyml.core import Model, Metrics

@step(outputs=["model"])
def train() -> Model:
    """FlowyML sees the return type is `Model`:
    β†’ Saves to artifact_store (GCS bucket)
    β†’ Registers in model_registry (Vertex AI)
    β†’ All automatic β€” zero extra code!
    """
    clf = train_classifier(data)
    return Model(data=clf, name="fraud_detector", version="1.0.0")

@step(outputs=["metrics"])
def evaluate() -> Metrics:
    """FlowyML sees the return type is `Metrics`:
    β†’ Logs to experiment_tracker (MLflow)
    β†’ Automatic β€” no mlflow.log_metrics() call needed!
    """
    return Metrics({"accuracy": 0.95, "f1": 0.92})

@step(outputs=["dataset"])
def preprocess() -> list:
    """Return type is `list` (not a typed artifact):
    β†’ Saved to artifact_store (GCS bucket) as serialized data
    β†’ FlowyML uses materializers to serialize/deserialize
    """
    return [1, 2, 3, 4, 5]

πŸ’‘ What triggers an upload?

Scenario What Happens
No stack / local stack Artifacts saved to ./artifacts/ on disk β€” no upload
Stack with artifact_store: gcs All step outputs uploaded to GCS bucket
Step returns Model type Saved to artifact store + registered in model registry (if configured)
Step returns Metrics type Logged to experiment tracker (MLflow/W&B) + saved to artifact store
artifact_routing rules defined Fine-grained control: deploy models, set paths, conditional deploy
No model_registry configured Models saved to artifact store only β€” no registration

πŸ€– GenAI & LLM Evaluation

FlowyML natively supports LLM observability and evaluation:

  • πŸ•΅οΈ LLM Tracing β€” Capture every LLM call, token count, latency, and cost with @trace_llm
  • 🎯 17+ Built-in Scorers β€” Relevance, Faithfulness, Toxicity, and LLM-as-a-Judge evaluators
  • 🏟️ Judge Arena β€” A/B test evaluators against human labels in real-time
  • πŸ›‘οΈ CI/CD Gates β€” Block bad models with quality assertions in your test suite
from flowyml.evals import evaluate, EvalDataset, Relevance, Faithfulness

# 1. Capture traces or use a static dataset
data = EvalDataset.create_genai("rag_quality", examples=[...])

# 2. Run multi-scorer evaluation
result = evaluate(data=data, scorers=[Relevance(), Faithfulness()])

# 3. Quality Gate
assert result.pass_rate >= 0.9

🎯 Why Teams Choose FlowyML

πŸ’‘ Zero Arrow Wiring

Define inputs and outputs on your steps. FlowyML auto-discovers the DAG β€” no .set_downstream() or >> plumbing.

πŸ”„ Same Code, Everywhere

Write once, deploy anywhere. Swap from local SQLite β†’ GCS + Vertex AI β†’ S3 + SageMaker with a single Stack config change.

πŸ“Š Built-In Observability

Real-time UI dashboard, lineage graphs, Gantt-chart timelines, and LLM token tracking β€” no extra plugins needed.


πŸ—ΊοΈ Master the Platform

  • πŸš€ Getting Started --- Build your first pipeline in 5 minutes. Learn the basics of Steps and Pipelines.

  • πŸ“– Core Concepts --- Deep dive into the heart of FlowyML: Pipelines, Steps, Context, and Asset Lineage.

  • πŸ“¦ Artifact-Centric Philosophy --- Understand why focusing on Artifacts instead of Tasks changes everything for ML stability.

  • ⚑ Advanced Features --- Master Caching, Parallelism, Conditional Execution, Step Grouping, and more.

  • πŸ“ˆ User Guide --- Manage projects, deployments, versioning, scheduling, and observability dashboards.

  • :plug: Plugins & Stacks --- Cloud integrations, model registries, type-based routing, and stack management.


πŸ—οΈ Practical Examples

Explore real-world implementations in the examples/ directory:


FlowyML is for those who are tired of plumbing.
Focus on the ML. We'll handle the flow.