⚙️ FlowyML Stack Configuration
FlowyML provides clean separation of concerns between your ML code and infrastructure configuration. Your code uses abstract functions, and the configuration determines where things actually go.
🔄 Same Code, Any Infrastructure
Write once, deploy anywhere. Swap from local SQLite → GCS + Vertex AI → S3 + SageMaker with a single config change. No code rewrites.
🏗️ The Principle
| ┌─────────────────────────────────┐
│ Your ML Code │
│ (uses abstract functions) │
│ │
│ save_model(model, "classifier")│
│ log_metrics({"accuracy": 0.95})│
│ save_artifact(data, "data.pkl")│
└───────────────┬─────────────────┘
│
▼
┌─────────────────────────────────┐
│ flowyml.yaml │
│ (configuration determines │
│ where data actually goes) │
│ │
│ artifact_store: gcs/s3/azure │
│ experiment_tracker: mlflow │
│ orchestrator: vertex_ai/k8s │
└─────────────────────────────────┘
|
Switching from GCS to S3? Just edit flowyml.yaml. No code changes needed.
🚀 Quick Start
1️⃣ Initialize Your Stack
| # Create flowyml.yaml with your plugins
flowyml stack init --tracker mlflow --store gcs --orchestrator vertex_ai
|
This creates:
| # flowyml.yaml
plugins:
experiment_tracker:
type: mlflow
tracking_uri: http://localhost:5000
experiment_name: my_experiments
artifact_store:
type: gcs
bucket: my-ml-artifacts
prefix: experiments/
project: my-gcp-project
orchestrator:
type: vertex_ai
project: my-gcp-project
location: us-central1
staging_bucket: gs://my-staging-bucket
|
2️⃣ Install the Plugins
| # Install all plugins in your stack
flowyml stack install
# Or install individually
flowyml plugin install mlflow
flowyml plugin install gcs
flowyml plugin install vertex_ai
|
3️⃣ Write Clean ML Code
| from flowyml.plugins import (
start_run, end_run,
log_params, log_metrics,
save_artifact, save_model,
)
# Your code is infrastructure-agnostic
def train_model(data, hyperparams):
start_run("training_v1")
log_params(hyperparams)
model = train(data, **hyperparams)
log_metrics({
"accuracy": evaluate(model),
"loss": compute_loss(model),
})
# Saves to whatever is configured in flowyml.yaml
save_model(model, "models/classifier")
save_artifact(data.stats, "data/statistics.json")
end_run()
return model
|
4️⃣ Switch Stack Without Code Changes
Going to production on AWS? Just update flowyml.yaml:
| # flowyml.yaml - Production AWS
plugins:
experiment_tracker:
type: mlflow
tracking_uri: https://mlflow.prod.company.com
artifact_store:
type: s3
bucket: prod-ml-artifacts
prefix: models/
region: us-east-1
orchestrator:
type: kubernetes
namespace: ml-production
|
Same code, different infrastructure. ✨
📋 Configuration Options
🔬 Experiment Trackers
| # MLflow
experiment_tracker:
type: mlflow
tracking_uri: http://localhost:5000
experiment_name: my_experiments
# Weights & Biases
experiment_tracker:
type: wandb
project: my-project
entity: my-team
# Neptune
experiment_tracker:
type: neptune
project: my-workspace/my-project
api_token: ${NEPTUNE_API_TOKEN} # Use env var
|
💾 Artifact Stores
| # Google Cloud Storage
artifact_store:
type: gcs
bucket: my-ml-artifacts
prefix: experiments/
project: my-gcp-project
# AWS S3
artifact_store:
type: s3
bucket: my-ml-artifacts
prefix: experiments/
region: us-east-1
# Azure Blob Storage
artifact_store:
type: azure_blob
account_name: mystorageaccount
container: ml-artifacts
|
☁️ Orchestrators
| # Vertex AI
orchestrator:
type: vertex_ai
project: my-gcp-project
location: us-central1
staging_bucket: gs://my-staging-bucket
# Kubernetes
orchestrator:
type: kubernetes
namespace: ml-pipelines
service_account: ml-runner
# Airflow
orchestrator:
type: airflow
dag_folder: /opt/airflow/dags
|
🐳 Container Registries
| # Google Artifact Registry
container_registry:
type: gcr
project: my-gcp-project
location: us-central1
repository: ml-images
use_artifact_registry: true
# AWS ECR
container_registry:
type: ecr
repository: ml-images
region: us-east-1
# Docker Hub
container_registry:
type: docker
username: myuser
repository: myuser/ml-images
|
🐳 Docker & Containerization
When running pipelines remotely (Vertex AI, SageMaker, Kubernetes), each step runs inside a Docker container. Understanding when FlowyML builds images for you vs. when you must build and push manually is critical.
🤖 Auto-Build (Remote Orchestrators)
With cloud orchestrators like Vertex AI and SageMaker, FlowyML can automatically build a Docker image from your code and push it to the configured container registry:
| # flowyml.yaml — auto-build enabled
plugins:
orchestrator:
type: vertex_ai
project: my-gcp-project
location: us-central1
staging_bucket: gs://my-staging-bucket
container_registry:
type: gcr
project: my-gcp-project
location: us-central1
repository: ml-pipelines
# FlowyML auto-generates a Dockerfile, builds it, and pushes to GCR
docker:
auto_build: true # Enable auto-build (default: true)
base_image: python:3.11-slim # Base Docker image
requirements_file: requirements.txt # or pyproject.toml
include_files: # Additional files to copy
- src/
- configs/
|
🔧 What happens under the hood
- FlowyML generates a
Dockerfile from your config
- Installs dependencies from
requirements.txt or pyproject.toml
- Copies your source code into the image
- Builds the image locally via Docker
- Pushes to the configured container registry
- Submits the pipeline with the image URI
🔨 Manual Build + Push
If you need full control over the Docker image (custom system packages, GPU drivers, multi-stage builds), build and push it yourself:
| # Dockerfile
FROM python:3.11-slim
# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Your code
COPY src/ /app/src/
COPY flowyml.yaml /app/flowyml.yaml
WORKDIR /app
|
Build and push:
| # Build
docker build -t us-central1-docker.pkg.dev/my-project/ml-pipelines/training:v1 .
# Push
docker push us-central1-docker.pkg.dev/my-project/ml-pipelines/training:v1
|
Reference the pre-built image in your config:
| plugins:
docker:
auto_build: false
image: us-central1-docker.pkg.dev/my-project/ml-pipelines/training:v1
|
🎮 GPU Images
For GPU workloads, start from an NVIDIA base image:
| # Dockerfile.gpu
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY src/ /app/src/
WORKDIR /app
|
| # flowyml.yaml — GPU config
plugins:
docker:
auto_build: false
image: us-central1-docker.pkg.dev/my-project/ml-images/gpu-training:latest
|
💡 When to use auto-build vs. manual
| Scenario |
Recommended |
| Quick prototyping |
✅ Auto-build |
| Standard Python dependencies only |
✅ Auto-build |
| Custom system packages (ffmpeg, CUDA) |
🔨 Manual build |
| Multi-stage builds for smaller images |
🔨 Manual build |
| CI/CD pipeline with image caching |
🔨 Manual build |
| GPU workloads with NVIDIA base |
🔨 Manual build |
📦 Dependency Management
FlowyML supports both requirements.txt and pyproject.toml (Poetry/PDM) for managing Python dependencies in containerized environments.
📄 Using requirements.txt
The simplest approach — works everywhere:
| # requirements.txt
flowyml[all]>=1.9.0
scikit-learn>=1.3.0
pandas>=2.0.0
mlflow>=2.8.0
google-cloud-aiplatform>=1.38.0
|
| # flowyml.yaml
plugins:
docker:
requirements_file: requirements.txt
|
📦 Using Poetry (pyproject.toml)
For teams using Poetry:
| # pyproject.toml
[tool.poetry]
name = "my-ml-project"
version = "0.1.0"
python = "^3.11"
[tool.poetry.dependencies]
flowyml = {version = "^1.9.0", extras = ["all"]}
scikit-learn = "^1.3.0"
pandas = "^2.0.0"
|
| # flowyml.yaml
plugins:
docker:
requirements_file: pyproject.toml # FlowyML detects format automatically
package_manager: poetry # or "pip" (default)
|
⚠️ Poetry in Docker
When using Poetry, FlowyML generates a poetry.lock export step in the Dockerfile. For manual builds, add:
| RUN pip install poetry && poetry export -f requirements.txt -o /tmp/reqs.txt
RUN pip install -r /tmp/reqs.txt
|
💪 Step Resources & GPU Allocation
Configure CPU, memory, and GPU resources per step. This is essential for remote execution where steps run in separate containers.
⚙️ Configuring Resources on Steps
| from flowyml import step
# CPU + Memory
@step(
resources={"cpu": "4", "memory": "16Gi"}
)
def preprocess_data():
"""Runs with 4 CPUs and 16GB RAM."""
...
# GPU Allocation
@step(
resources={
"cpu": "8",
"memory": "32Gi",
"accelerator_type": "NVIDIA_TESLA_T4",
"accelerator_count": 1,
}
)
def train_model():
"""Runs on a T4 GPU with 32GB RAM."""
...
# Multiple GPUs
@step(
resources={
"cpu": "16",
"memory": "64Gi",
"accelerator_type": "NVIDIA_TESLA_A100",
"accelerator_count": 4,
}
)
def train_large_model():
"""Runs on 4x A100 GPUs."""
...
|
📋 Available Resource Fields
| Field |
Type |
Description |
Example |
cpu |
str |
Number of CPU cores |
"4", "0.5" |
memory |
str |
RAM allocation |
"16Gi", "512Mi" |
accelerator_type |
str |
GPU type |
"NVIDIA_TESLA_T4", "NVIDIA_TESLA_A100" |
accelerator_count |
int |
Number of GPUs |
1, 4, 8 |
disk |
str |
Ephemeral disk |
"100Gi" |
timeout |
int |
Max execution time (seconds) |
3600 |
🏗️ Default Resources via YAML
Set defaults for all steps, then override per-step:
| # flowyml.yaml
pipeline_defaults:
resources:
cpu: "2"
memory: "8Gi"
timeout: 1800 # 30 minutes
# Per-step overrides always take precedence
|
💡 Resource right-sizing
Start with smaller resources and scale up. Use @step(resources={"cpu": "1", "memory": "4Gi"}) for data preprocessing and reserve GPUs only for training steps. This reduces costs significantly.
🔑 Environment Variables
You can use environment variables anywhere in your config:
| plugins:
experiment_tracker:
type: mlflow
tracking_uri: ${MLFLOW_TRACKING_URI}
artifact_store:
type: s3
bucket: ${ML_BUCKET}
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
|
🖥️ Stack Management Commands
| # Create initial configuration
flowyml stack init --tracker mlflow --store gcs
# Show current stack
flowyml stack show
# Validate configuration
flowyml stack validate
# Install all plugins in stack
flowyml stack install
# List all stacks
flowyml stack list
|
🔀 Named Multi-Stack Support
FlowyML supports named stacks for managing development, staging, and production environments in a single config file:
| # flowyml.yaml
stacks:
local:
orchestrator: { type: local }
artifact_store: { type: local, path: "./artifacts" }
gcp-dev:
orchestrator: { type: vertex_ai, project: ${GCP_PROJECT} }
artifact_store: { type: gcs, bucket: dev-ml-artifacts }
experiment_tracker: { type: mlflow }
gcp-prod:
orchestrator: { type: vertex_ai, project: ${GCP_PROJECT} }
artifact_store: { type: gcs, bucket: prod-ml-artifacts }
model_registry: { type: vertex_model_registry }
model_deployer: { type: vertex_endpoint }
artifact_routing:
Model: { store: gcs, register: true, deploy: true }
Metrics: { log_to_tracker: true }
aws-staging:
orchestrator: { type: sagemaker, region: us-east-1 }
artifact_store: { type: s3, bucket: staging-ml }
model_registry: { type: sagemaker_model_registry }
active_stack: local
|
🔄 Switching Stacks
| # Via environment variable
FLOWYML_STACK=gcp-prod flowyml run my_pipeline
# Via CLI
flowyml stack set gcp-prod
flowyml stack list
flowyml stack show gcp-prod
|
| # Via context manager
from flowyml.plugins import use_stack
with use_stack("gcp-prod"):
pipeline.run()
# Via decorator
from flowyml.plugins import use_stack_decorator
@use_stack_decorator("gcp-prod")
def production_training():
pipeline.run()
|
🔀 Type-Based Artifact Routing
Route artifacts automatically based on their type annotations:
| from flowyml.core import step, Model, Dataset, Metrics
@step
def train() -> Model:
"""Returns Model - auto-routes to store + registry + endpoint"""
return Model(clf, name="classifier", version="1.0.0")
@step
def evaluate() -> Metrics:
"""Returns Metrics - auto-logged to tracker"""
return Metrics({"accuracy": 0.95})
|
Configure routing rules per type:
| artifact_routing:
Model:
store: gcs
register: true
deploy: true
endpoint_name: prod-model
Dataset:
store: gcs
path: "{run_id}/data/{step_name}"
Metrics:
log_to_tracker: true
Parameters:
log_to_tracker: true
|
→ Deep Dive: Type Routing Guide
📚 API Reference
Tracking Functions
| Function |
Description |
start_run(name) |
Start a new experiment run |
end_run() |
End the current run |
log_params(dict) |
Log parameters |
log_metrics(dict) |
Log metrics |
set_tag(key, value) |
Set a tag |
Artifact Functions
| Function |
Description |
save_artifact(obj, path) |
Save any artifact |
load_artifact(path) |
Load an artifact |
artifact_exists(path) |
Check if exists |
list_artifacts(path) |
List artifacts |
Model Functions
| Function |
Description |
save_model(model, path) |
Save model with tracking |
load_model(path) |
Load a model |
Container Functions
| Function |
Description |
push_image(name, tag) |
Push Docker image |
get_image_uri(name, tag) |
Get image URI |
Utility Functions
| Function |
Description |
show_stack() |
Show configured stack |
validate_stack() |
Validate all plugins installed |