🏗️ Stack Architecture Guide
Overview
flowyml's stack system provides a flexible, modular architecture for running pipelines across different infrastructure environments. Similar to ZenML, stacks are composable collections of components that define where and how your pipelines execute.
Core Concepts
Stack Components
A Stack is composed of several components:
┌─────────────────────────────────────┐
│ flowyml Stack │
├─────────────────────────────────────┤
│ ▸ Orchestrator (optional) │
│ - Vertex AI, Kubeflow, Airflow │
│ │
│ ▸ Executor │
│ - Local, Remote, Kubernetes │
│ │
│ ▸ Artifact Store │
│ - Local FS, GCS, S3, Azure │
│ │
│ ▸ Metadata Store │
│ - SQLite, PostgreSQL, MySQL │
│ │
│ ▸ Container Registry (optional) │
│ - GCR, ECR, Docker Hub │
└─────────────────────────────────────┘
Component Types
1. Orchestrator
Manages pipeline workflow execution and scheduling.
- Vertex AI: Google Cloud's managed ML platform
- Kubeflow: Kubernetes-native ML workflows
- Airflow: Workflow scheduling and monitoring
- None: Direct execution (development)
2. Executor
Runs individual pipeline steps.
- LocalExecutor: Runs steps in current process
- RemoteExecutor: Submits to remote compute
- KubernetesExecutor: Runs in K8s pods
- VertexAIExecutor: Runs on Vertex AI
3. Artifact Store
Stores pipeline artifacts and outputs.
- LocalArtifactStore: Local filesystem
- GCSArtifactStore: Google Cloud Storage
- S3ArtifactStore: Amazon S3
- AzureBlobStore: Azure Blob Storage
4. Metadata Store
Tracks pipeline runs, metrics, and lineage.
- SQLiteMetadataStore: Local SQLite database
- PostgreSQLMetadataStore: PostgreSQL database
- CloudSQLMetadataStore: Google Cloud SQL
5. Container Registry
Manages Docker images for containerized execution.
- GCRContainerRegistry: Google Container Registry
- ECRContainerRegistry: AWS Elastic Container Registry
- DockerHubRegistry: Docker Hub
Stack Types
Local Stack
For development and testing:
from flowyml.stacks import LocalStack
stack = LocalStack(
name="local",
artifact_path=".flowyml/artifacts",
metadata_path=".flowyml/metadata.db"
)
Use Cases: - Local development - Unit testing - Prototyping - Small datasets
GCP Stack
For production on Google Cloud Platform:
from flowyml.stacks.gcp import GCPStack
stack = GCPStack(
name="production",
project_id="my-gcp-project",
region="us-central1",
bucket_name="my-artifacts",
registry_uri="gcr.io/my-project"
)
Use Cases: - Production ML training - Large-scale data processing - Team collaboration - CI/CD pipelines
AWS Stack (Coming Soon)
from flowyml.stacks.aws import AWSStack
stack = AWSStack(
name="aws-prod",
region="us-east-1",
s3_bucket="my-artifacts",
ecr_registry="123456789.dkr.ecr.us-east-1.amazonaws.com"
)
Kubernetes Stack (Coming Soon)
from flowyml.stacks.k8s import KubernetesStack
stack = KubernetesStack(
name="k8s-cluster",
namespace="flowyml",
storage_class="standard"
)
Automatic Component Usage
When you use a stack, all components are automatically used by the pipeline. This is the core concept of stacks - they encapsulate all infrastructure configuration.
How Stacks Work
- Create or select a stack with all components configured
- Attach stack to pipeline (via constructor or active stack)
- Pipeline automatically uses:
- Stack's orchestrator (e.g., VertexAIOrchestrator)
- Stack's executor (e.g., VertexAIExecutor)
- Stack's artifact store (e.g., GCSArtifactStore)
- Stack's metadata store (e.g., CloudSQLMetadataStore)
- Stack's container registry (e.g., GCRContainerRegistry)
Example: GCP Stack with Automatic Orchestrator
from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack
# Create GCP stack - includes VertexAIOrchestrator automatically
gcp_stack = GCPStack(
name="production",
project_id="my-gcp-project",
region="us-central1",
bucket_name="my-artifacts",
service_account="my-sa@my-project.iam.gserviceaccount.com"
)
# Create pipeline with stack
pipeline = Pipeline("training_pipeline", stack=gcp_stack)
pipeline.add_step(train_model)
pipeline.add_step(evaluate_model)
# Run pipeline - automatically uses Vertex AI orchestrator!
# No need to specify orchestrator explicitly
result = pipeline.run()
Stack Priority in Pipeline.run()
When you call pipeline.run(), the orchestrator is determined in this order:
- Explicit
orchestratorparameter (if provided) - highest priority - Stack's orchestrator (if stack is set/active) - recommended
- Default LocalOrchestrator - fallback
# Option 1: Use stack's orchestrator (recommended)
pipeline = Pipeline("my_pipeline", stack=gcp_stack)
pipeline.run() # Uses VertexAIOrchestrator from stack
# Option 2: Override orchestrator for this run only
pipeline.run(orchestrator=custom_orchestrator) # Override stack's orchestrator
# Option 3: Use active stack
from flowyml.stacks.registry import set_active_stack
set_active_stack("gcp-production")
pipeline = Pipeline("my_pipeline")
pipeline.run() # Uses orchestrator from active stack
Resource Configuration
Define compute resources for your pipelines:
from flowyml.stacks.components import ResourceConfig
# CPU-intensive workload
cpu_config = ResourceConfig(
cpu="8",
memory="32Gi",
disk_size="100Gi"
)
# GPU workload
gpu_config = ResourceConfig(
cpu="16",
memory="64Gi",
gpu="nvidia-tesla-v100",
gpu_count=4,
machine_type="n1-highmem-16"
)
# Memory-intensive workload
memory_config = ResourceConfig(
cpu="32",
memory="256Gi",
machine_type="n1-megamem-96"
)
Docker Configuration
Containerize your pipelines:
from flowyml.stacks.components import DockerConfig
# Pre-built image
docker_config = DockerConfig(
image="gcr.io/my-project/ml-pipeline:v1.0"
)
# Build from Dockerfile
docker_config = DockerConfig(
dockerfile="./Dockerfile",
build_context=".",
build_args={"PYTHON_VERSION": "3.11"}
)
# Dynamic requirements
docker_config = DockerConfig(
base_image="python:3.11-slim",
requirements=[
"tensorflow>=2.12.0",
"pandas>=2.0.0"
],
env_vars={
"PYTHONUNBUFFERED": "1",
"TF_CPP_MIN_LOG_LEVEL": "2"
}
)
Stack Registry
Manage multiple stacks and switch seamlessly:
from flowyml.stacks.registry import StackRegistry
# Create registry
registry = StackRegistry()
# Register stacks
registry.register_stack(local_stack)
registry.register_stack(gcp_stack)
registry.register_stack(aws_stack)
# List available stacks
print(registry.list_stacks())
# ['local', 'gcp-prod', 'aws-prod']
# Switch stacks
registry.set_active_stack("local") # Development
registry.set_active_stack("gcp-prod") # Production
# Get active stack
active = registry.get_active_stack()
Using Stacks with Pipelines
Method 1: Direct Assignment
from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack
stack = GCPStack(...)
pipeline = Pipeline("my_pipeline", stack=stack)
Method 2: Global Registry
from flowyml import Pipeline
from flowyml.stacks.registry import set_active_stack
# Set active stack globally
set_active_stack("production")
# All new pipelines use active stack
pipeline = Pipeline("my_pipeline")
Method 3: Per-Run Override
pipeline = Pipeline("my_pipeline")
# Run locally
pipeline.run(stack=local_stack)
# Run on GCP
pipeline.run(stack=gcp_stack)
Best Practices
1. Environment-Based Configuration
import os
from flowyml.stacks import LocalStack
from flowyml.stacks.gcp import GCPStack
env = os.getenv("ENVIRONMENT", "local")
if env == "production":
stack = GCPStack(...)
elif env == "staging":
stack = GCPStack(..., bucket_name="staging-artifacts")
else:
stack = LocalStack()
2. Configuration Files
# flowyml.yaml
stacks:
local:
type: local
artifact_path: .flowyml/artifacts
production:
type: gcp
project_id: ${GCP_PROJECT_ID}
region: us-central1
bucket_name: ${GCP_BUCKET}
3. Validation
4. Cost Optimization
# Use preemptible instances for fault-tolerant workloads
ResourceConfig(
machine_type="n1-standard-4",
preemptible=True # 80% cost savings
)
# Right-size resources
ResourceConfig(
cpu="2", # Start small
memory="8Gi",
autoscaling=True # Scale as needed
)
Advanced Patterns
Multi-Cloud Setup
# GCP for training
train_stack = GCPStack(name="gcp-train", ...)
# AWS for inference
inference_stack = AWSStack(name="aws-inference", ...)
# Different pipelines, different clouds
training_pipeline.run(stack=train_stack)
inference_pipeline.run(stack=inference_stack)
Hybrid Execution
# Some steps local, some on cloud
@step(stack=local_stack)
def preprocess():
...
@step(stack=gcp_stack, resources=gpu_config)
def train():
...
@step(stack=local_stack)
def evaluate():
...
Troubleshooting
Common Issues
-
Authentication Errors
-
Permission Denied
- Check service account roles
-
Verify IAM permissions
-
Resource Quota
- Request quota increase
-
Use smaller machine types
-
Image Not Found
- Push image to registry
- Check image URI format