Skip to content

πŸ—οΈ Architecture

What you'll learn

How FlowyML is structured internally β€” from pipeline definition to execution, storage, and visualization. Understanding the architecture helps you make better design decisions and troubleshoot issues.

FlowyML is designed as a modular, layered system that separates pipeline definition, execution, storage, and visualization into independent components.


High-Level Architecture

graph TD
    User["User Code<br/>(@step, Pipeline)"] --> API["Core API"]
    API --> Compiler["DAG Compiler"]
    Compiler --> Executor["Executor Engine"]

    Executor --> Local["LocalOrchestrator"]
    Executor --> Remote["RemoteOrchestrator<br/>(SageMaker, Vertex AI, K8s)"]
    Executor --> Docker["DockerOrchestrator"]

    Local --> Cache["Cache Store"]
    Local --> Artifacts["Artifact Store<br/>(Local, S3, GCS, Azure)"]
    Local --> Metadata["Metadata Store<br/>(SQLite, PostgreSQL)"]

    Metadata --> UIBackend["UI Backend<br/>(FastAPI)"]
    UIBackend --> Frontend["React Dashboard"]

    Remote --> CloudAPI["Cloud APIs"]

Core Components

1. Pipeline Definition Layer

The user-facing API for defining ML workflows:

Component Role
Pipeline Container for steps, config, and execution
Step Unit of work, wrapped by @step decorator
Context Manages parameters and runtime state injection
Assets Typed artifacts (Model, Dataset, Metric) with lineage

2. Execution Engine

Handles DAG compilation and execution:

Component Role
DAG Compiler Builds dependency graph from step inputs/outputs
LocalOrchestrator Executes steps in current process/thread
DockerOrchestrator Runs steps in isolated Docker containers
RemoteOrchestrator Submits jobs to cloud platforms (async)
Cache Intercepts execution; returns stored results if unchanged

3. Storage Layer

Persists artifacts and metadata:

Store Default Alternatives
Artifact Store Local filesystem S3, GCS, Azure Blob (via fsspec)
Metadata Store SQLite PostgreSQL, MySQL

4. UI Architecture

Decoupled client-server architecture:

Layer Technology Role
Backend FastAPI REST API for pipelines, runs, assets, traces
Frontend React + Vite SPA with DAG visualization (reactflow)
Transport REST (WebSocket planned) Real-time updates via polling

Data Flow

sequenceDiagram
    participant User
    participant Pipeline
    participant Compiler as DAG Compiler
    participant Cache
    participant Executor
    participant Store as Artifact Store
    participant Meta as Metadata Store
    participant UI

    User->>Pipeline: pipeline.run()
    Pipeline->>Compiler: Build DAG from steps
    Compiler->>Executor: Topological execution order

    loop For each step
        Executor->>Cache: Check cache (code hash + input hash)
        alt Cache hit
            Cache-->>Executor: Return cached result
        else Cache miss
            Executor->>Executor: Execute step function
            Executor->>Store: Save output artifacts
            Executor->>Cache: Store result in cache
        end
        Executor->>Meta: Write run metadata
    end

    Meta-->>UI: Dashboard reads metadata

Design Principles

Principle What It Means
Zero Config Works out of the box with sensible defaults
Asset-Centric Focus on data produced (artifacts), not just tasks
Framework Agnostic Works with PyTorch, TensorFlow, sklearn, or raw Python
Progressive Disclosure Simple for beginners, powerful for experts
Infrastructure as Config Change deployment target via env variable, not code

Module Map

flowyml/
β”œβ”€β”€ core/               # Pipeline, Step, Context, DAG
β”œβ”€β”€ io/                 # Materializers (serialization/deserialization)
β”œβ”€β”€ storage/            # Artifact Store, Metadata Store, Cache
β”œβ”€β”€ integrations/       # Cloud providers, ML frameworks
β”œβ”€β”€ monitoring/         # Alerts, System/Pipeline monitors, Data drift
β”œβ”€β”€ tracking/           # Experiment tracking, Model leaderboard
β”œβ”€β”€ evals/              # Evaluation framework (17+ scorers)
β”œβ”€β”€ plugins/            # Plugin system (native + external)
β”œβ”€β”€ stacks/             # Stack management (local, cloud, hybrid)
└── ui/                 # FastAPI backend + React frontend