Improved UX Guide - Configuration-Driven Infrastructure

Overview

flowyml now supports complete separation between pipeline logic and infrastructure configuration. Your pipeline code remains clean and infrastructure-agnostic.

Key Improvements

✨ Configuration-Driven

All infrastructure is defined in flowyml.yaml:

stacks:
  local:
    type: local

  production:
    type: gcp
    project_id: ${GCP_PROJECT_ID}
    bucket_name: ${GCP_BUCKET}

resources:
  gpu_training:
    cpu: "8"
    memory: "32Gi"
    gpu: "nvidia-tesla-v100"

🎯 CLI-Based Execution

Run the same pipeline on different stacks without code changes:

# Development
flowyml run pipeline.py

# Production
flowyml run pipeline.py --stack production

# With GPU resources
flowyml run pipeline.py --stack production --resources gpu_training

📦 Auto-Detection

flowyml automatically detects: - ✅ Existing Dockerfile - ✅ pyproject.toml for Poetry - ✅ requirements.txt - ✅ Environment variables from .env

🔒 Clean Separation

Before (Tightly Coupled):

from flowyml.stacks.gcp import GCPStack
from flowyml.stacks.components import ResourceConfig, DockerConfig

# Infrastructure hardcoded in pipeline!
stack = GCPStack(project_id="...", bucket_name="...")
resources = ResourceConfig(cpu="8", memory="32Gi")
docker = DockerConfig(image="...")

pipeline = Pipeline("my_pipeline", stack=stack)
result = pipeline.run(resources=resources, docker=docker)

After (Decoupled):

# Pure pipeline logic - NO infrastructure!
pipeline = Pipeline("my_pipeline")
result = pipeline.run()

# Infrastructure configured externally via:
# - flowyml.yaml
# - CLI flags
# - Environment variables

Quick Start

1. Initialize Configuration

flowyml init

Creates flowyml.yaml with sensible defaults.

2. Configure Stacks

Edit flowyml.yaml:

stacks:
  local:
    type: local

  staging:
    type: gcp
    project_id: my-project-staging
    region: us-central1

  production:
    type: gcp
    project_id: my-project-prod
    region: us-central1

3. Write Clean Pipelines

# pipeline.py
from flowyml import Pipeline, step

@step
def process_data(input_path: str):
    # Your logic here
    return {"result": "processed"}

pipeline = Pipeline("data_processing")
pipeline.add_step(process_data)

4. Run Anywhere

# Development
flowyml run pipeline.py --context input_path=local/data.csv

# Staging
flowyml run pipeline.py --stack staging --context input_path=gs://staging/data.csv

# Production
flowyml run pipeline.py --stack production --context input_path=gs://prod/data.csv

Environment Variables

Use .env file for secrets:

# .env
GCP_PROJECT_ID=my-project
GCP_BUCKET=my-artifacts
GCP_SERVICE_ACCOUNT=my-sa@project.iam.gserviceaccount.com

Reference in flowyml.yaml:

stacks:
  production:
    project_id: ${GCP_PROJECT_ID}
    bucket_name: ${GCP_BUCKET}

Docker Integration

Option 1: Existing Dockerfile

flowyml automatically uses it:

docker:
  dockerfile: ./Dockerfile
  build_context: .

Option 2: Poetry

Uses pyproject.toml:

docker:
  use_poetry: true
  base_image: python:3.11-slim

Option 3: Requirements File

docker:
  requirements_file: requirements.txt
  base_image: python:3.11-slim

CLI Commands

Stack Management

# List configured stacks
flowyml stack list

# Show stack details
flowyml stack show production

# Set default stack
flowyml stack set-default production

Running Pipelines

# Basic run
flowyml run pipeline.py

# Specify stack
flowyml run pipeline.py --stack production

# Specify resources
flowyml run pipeline.py --resources gpu_training

# Pass context
flowyml run pipeline.py --context key1=value1 --context key2=value2

# Dry run (show configuration)
flowyml run pipeline.py --stack production --dry-run

# Custom config file
flowyml run pipeline.py --config custom.yaml

Benefits

🎯 For Data Scientists

Write pure pipeline logic
No infrastructure code
Same code, multiple environments
Easy testing locally

🏗️ For MLOps Engineers

Centralized infrastructure config
Version control for infra
Easy environment management
Security via environment variables

👥 For Teams

Consistent deployment
Easy collaboration
Clear separation of concerns
Reduced merge conflicts

Migration Guide

Old Style (Coupled)

from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack

stack = GCPStack(
    project_id="my-project",
    bucket_name="my-bucket"
)

pipeline = Pipeline("my_pipeline", stack=stack)

New Style (Decoupled)

Create flowyml.yaml:

stacks:
  production:
    type: gcp
    project_id: my-project
    bucket_name: my-bucket

Simplify pipeline:

from flowyml import Pipeline

pipeline = Pipeline("my_pipeline")
# Stack loaded from flowyml.yaml

Run with CLI:

flowyml run pipeline.py --stack production

Advanced Patterns

Per-Step Resources

resources:
  preprocessing:
    cpu: "2"
    memory: "8Gi"

  training:
    cpu: "16"
    memory: "64Gi"
    gpu: "nvidia-tesla-v100"

  inference:
    cpu: "4"
    memory: "16Gi"

Multi-Region Deployment

stacks:
  us-prod:
    type: gcp
    region: us-central1
    bucket_name: us-artifacts

  eu-prod:
    type: gcp
    region: europe-west1
    bucket_name: eu-artifacts

Environment-Specific Configs

# dev.yaml
stacks:
  dev:
    type: local

# prod.yaml
stacks:
  prod:
    type: gcp

# Development
flowyml run pipeline.py --config dev.yaml

# Production
flowyml run pipeline.py --config prod.yaml

Best Practices

✅ Never hardcode infrastructure in pipeline code
✅ Use flowyml.yaml for all stack configuration
✅ Use environment variables for secrets
✅ Define resource presets for common workloads
✅ Version control flowyml.yaml (without secrets)
✅ Use .env for local development secrets
✅ Document required environment variables

Next Steps

See examples/clean_pipeline.py for a complete example
Read Stack Architecture for deep dive
Check GCP Stack Guide for cloud deployment