ποΈ Stack Architecture Guide
Overview
flowyml's stack system provides a flexible, modular architecture for running pipelines across different infrastructure environments. Stacks are composable collections of components that define where and how your pipelines execute.
Core Concepts
Stack Components
A Stack is composed of several components:
| βββββββββββββββββββββββββββββββββββββββ
β flowyml Stack β
βββββββββββββββββββββββββββββββββββββββ€
β βΈ Orchestrator (optional) β
β - Vertex AI, Kubeflow, Airflow β
β β
β βΈ Executor β
β - Local, Remote, Kubernetes β
β β
β βΈ Artifact Store β
β - Local FS, GCS, S3, Azure β
β β
β βΈ Metadata Store β
β - SQLite, PostgreSQL, MySQL β
β β
β βΈ Container Registry (optional) β
β - GCR, ECR, Docker Hub β
βββββββββββββββββββββββββββββββββββββββ
|
Component Types
1. Orchestrator
Manages pipeline workflow execution and scheduling.
- Vertex AI: Google Cloud's managed ML platform
- Kubeflow: Kubernetes-native ML workflows
- Airflow: Workflow scheduling and monitoring
- None: Direct execution (development)
2. Executor
Runs individual pipeline steps.
- LocalExecutor: Runs steps in current process
- RemoteExecutor: Submits to remote compute
- KubernetesExecutor: Runs in K8s pods
- VertexAIExecutor: Runs on Vertex AI
3. Artifact Store
Stores pipeline artifacts and outputs.
- LocalArtifactStore: Local filesystem
- GCSArtifactStore: Google Cloud Storage
- S3ArtifactStore: Amazon S3
- AzureBlobStore: Azure Blob Storage
Tracks pipeline runs, metrics, and lineage.
- SQLiteMetadataStore: Local SQLite database
- PostgreSQLMetadataStore: PostgreSQL database
- CloudSQLMetadataStore: Google Cloud SQL
5. Container Registry
Manages Docker images for containerized execution.
- GCRContainerRegistry: Google Container Registry
- ECRContainerRegistry: AWS Elastic Container Registry
- DockerHubRegistry: Docker Hub
Stack Types
Local Stack
For development and testing:
| from flowyml.stacks import LocalStack
stack = LocalStack(
name="local",
artifact_path=".flowyml/artifacts",
metadata_path=".flowyml/metadata.db"
)
|
Use Cases:
- Local development
- Unit testing
- Prototyping
- Small datasets
GCP Stack
For production on Google Cloud Platform:
| from flowyml.stacks.gcp import GCPStack
stack = GCPStack(
name="production",
project_id="my-gcp-project",
region="us-central1",
bucket_name="my-artifacts",
registry_uri="gcr.io/my-project"
)
|
Use Cases:
- Production ML training
- Large-scale data processing
- Team collaboration
- CI/CD pipelines
AWS Stack (Coming Soon)
| from flowyml.stacks.aws import AWSStack
stack = AWSStack(
name="aws-prod",
region="us-east-1",
s3_bucket="my-artifacts",
ecr_registry="123456789.dkr.ecr.us-east-1.amazonaws.com"
)
|
Kubernetes Stack (Coming Soon)
| from flowyml.stacks.k8s import KubernetesStack
stack = KubernetesStack(
name="k8s-cluster",
namespace="flowyml",
storage_class="standard"
)
|
Automatic Component Usage
When you use a stack, all components are automatically used by the pipeline. This is the core concept of stacks - they encapsulate all infrastructure configuration.
How Stacks Work
- Create or select a stack with all components configured
- Attach stack to pipeline (via constructor or active stack)
- Pipeline automatically uses:
- Stack's orchestrator (e.g., VertexAIOrchestrator)
- Stack's executor (e.g., VertexAIExecutor)
- Stack's artifact store (e.g., GCSArtifactStore)
- Stack's metadata store (e.g., CloudSQLMetadataStore)
- Stack's container registry (e.g., GCRContainerRegistry)
Example: GCP Stack with Automatic Orchestrator
| from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack
# Create GCP stack - includes VertexAIOrchestrator automatically
gcp_stack = GCPStack(
name="production",
project_id="my-gcp-project",
region="us-central1",
bucket_name="my-artifacts",
service_account="my-sa@my-project.iam.gserviceaccount.com"
)
# Create pipeline with stack
pipeline = Pipeline("training_pipeline", stack=gcp_stack)
pipeline.add_step(train_model)
pipeline.add_step(evaluate_model)
# Run pipeline - automatically uses Vertex AI orchestrator!
# No need to specify orchestrator explicitly
result = pipeline.run()
|
Stack Priority in Pipeline.run()
When you call pipeline.run(), the orchestrator is determined in this order:
- Explicit
orchestrator parameter (if provided) - highest priority
- Stack's orchestrator (if stack is set/active) - recommended
- Default LocalOrchestrator - fallback
| # Option 1: Use stack's orchestrator (recommended)
pipeline = Pipeline("my_pipeline", stack=gcp_stack)
pipeline.run() # Uses VertexAIOrchestrator from stack
# Option 2: Override orchestrator for this run only
pipeline.run(orchestrator=custom_orchestrator) # Override stack's orchestrator
# Option 3: Use active stack
from flowyml.stacks.registry import set_active_stack
set_active_stack("gcp-production")
pipeline = Pipeline("my_pipeline")
pipeline.run() # Uses orchestrator from active stack
|
Resource Configuration
Define compute resources for your pipelines:
| from flowyml.stacks.components import ResourceConfig
# CPU-intensive workload
cpu_config = ResourceConfig(
cpu="8",
memory="32Gi",
disk_size="100Gi"
)
# GPU workload
gpu_config = ResourceConfig(
cpu="16",
memory="64Gi",
gpu="nvidia-tesla-v100",
gpu_count=4,
machine_type="n1-highmem-16"
)
# Memory-intensive workload
memory_config = ResourceConfig(
cpu="32",
memory="256Gi",
machine_type="n1-megamem-96"
)
|
Docker Configuration
Containerize your pipelines:
| from flowyml.stacks.components import DockerConfig
# Pre-built image
docker_config = DockerConfig(
image="gcr.io/my-project/ml-pipeline:v1.0"
)
# Build from Dockerfile
docker_config = DockerConfig(
dockerfile="./Dockerfile",
build_context=".",
build_args={"PYTHON_VERSION": "3.11"}
)
# Dynamic requirements
docker_config = DockerConfig(
base_image="python:3.11-slim",
requirements=[
"tensorflow>=2.12.0",
"pandas>=2.0.0"
],
env_vars={
"PYTHONUNBUFFERED": "1",
"TF_CPP_MIN_LOG_LEVEL": "2"
}
)
|
Stack Registry
Manage multiple stacks and switch seamlessly:
| from flowyml.stacks.registry import StackRegistry
# Create registry
registry = StackRegistry()
# Register stacks
registry.register_stack(local_stack)
registry.register_stack(gcp_stack)
registry.register_stack(aws_stack)
# List available stacks
print(registry.list_stacks())
# ['local', 'gcp-prod', 'aws-prod']
# Switch stacks
registry.set_active_stack("local") # Development
registry.set_active_stack("gcp-prod") # Production
# Get active stack
active = registry.get_active_stack()
|
Stack Hydration from YAML β‘NEW
The StackConfig.to_stack() method lets you hydrate a YAML-defined stack configuration into a live, fully-wired Stack object. This bridges the gap between declarative YAML configuration and runtime stack assembly.
How It Works
- Define stacks in
flowyml.yaml with component configs
StackManager parses the YAML into StackConfig objects
StackConfig.to_stack() resolves each component via ComponentRegistry
- Returns a fully-wired
Stack ready for pipeline execution
YAML Configuration
| # flowyml.yaml
stacks:
local:
orchestrator: { type: local }
artifact_store: { type: local, path: "./artifacts" }
gcp-prod:
orchestrator: { type: vertex_ai, project: my-gcp-project }
artifact_store: { type: gcs, bucket: ml-artifacts }
model_registry: { type: vertex_model_registry }
model_deployer: { type: vertex_endpoints }
experiment_tracker: { type: mlflow }
artifact_routing:
Model: { store: gcs, register: true, deploy: true }
Dataset: { store: gcs, path: "{run_id}/data/{step_name}" }
Metrics: { log_to_tracker: true }
aws-staging:
orchestrator: { type: sagemaker, region: us-east-1 }
artifact_store: { type: s3, bucket: staging-ml }
model_registry: { type: sagemaker_model_registry }
active_stack: local
|
Hydrating a Stack
| from flowyml.plugins.config import PluginConfig
from flowyml.plugins.stack_config import StackManager
# Load from YAML
config = PluginConfig("flowyml.yaml")
manager = StackManager(config)
# Hydrate local stack
local_stack = manager.get_stack("local").to_stack()
assert local_stack.orchestrator # LocalOrchestrator
assert local_stack.artifact_store # LocalArtifactStore
# Hydrate GCP stack
gcp_stack = manager.get_stack("gcp-prod").to_stack()
assert gcp_stack.orchestrator # VertexAIOrchestrator
|
Artifact Routing
The hydrated stack carries artifact routing rules:
| live = manager.get_stack("gcp-prod").to_stack()
# Routing config is attached to the live stack
print(live._artifact_routing.rules) # {"Model": ArtifactRoutingRule(...), ...}
print(live._model_registry_config) # {"type": "vertex_model_registry"}
|
Stack Switching
Use the context manager for temporary stack switching:
| from flowyml.plugins.config import PluginConfig
from flowyml.plugins.stack_config import StackManager
config = PluginConfig("flowyml.yaml")
manager = StackManager(config)
with manager.use_stack("gcp-prod"):
pipeline.run() # Uses GCP stack
# Reverts to previous active stack
# Or set via environment variable
# FLOWYML_STACK=gcp-prod flowyml run pipeline.py
|
Using Stacks with Pipelines
Method 1: Direct Assignment
| from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack
stack = GCPStack(...)
pipeline = Pipeline("my_pipeline", stack=stack)
|
Method 2: Global Registry
| from flowyml import Pipeline
from flowyml.stacks.registry import set_active_stack
# Set active stack globally
set_active_stack("production")
# All new pipelines use active stack
pipeline = Pipeline("my_pipeline")
|
Method 3: Per-Run Override
| pipeline = Pipeline("my_pipeline")
# Run locally
pipeline.run(stack=local_stack)
# Run on GCP
pipeline.run(stack=gcp_stack)
|
Best Practices
1. Environment-Based Configuration
| import os
from flowyml.stacks import LocalStack
from flowyml.stacks.gcp import GCPStack
env = os.getenv("ENVIRONMENT", "local")
if env == "production":
stack = GCPStack(...)
elif env == "staging":
stack = GCPStack(..., bucket_name="staging-artifacts")
else:
stack = LocalStack()
|
2. Configuration Files
| # flowyml.yaml
stacks:
local:
type: local
artifact_path: .flowyml/artifacts
production:
type: gcp
project_id: ${GCP_PROJECT_ID}
region: us-central1
bucket_name: ${GCP_BUCKET}
|
3. Validation
| # Always validate before production
stack.validate()
# Check configuration
print(stack.to_dict())
|
4. Cost Optimization
| # Use preemptible instances for fault-tolerant workloads
ResourceConfig(
machine_type="n1-standard-4",
preemptible=True # 80% cost savings
)
# Right-size resources
ResourceConfig(
cpu="2", # Start small
memory="8Gi",
autoscaling=True # Scale as needed
)
|
Advanced Patterns
Multi-Cloud Setup
| # GCP for training
train_stack = GCPStack(name="gcp-train", ...)
# AWS for inference
inference_stack = AWSStack(name="aws-inference", ...)
# Different pipelines, different clouds
training_pipeline.run(stack=train_stack)
inference_pipeline.run(stack=inference_stack)
|
Hybrid Execution
| # Some steps local, some on cloud
@step(stack=local_stack)
def preprocess():
...
@step(stack=gcp_stack, resources=gpu_config)
def train():
...
@step(stack=local_stack)
def evaluate():
...
|
Troubleshooting
Common Issues
-
Authentication Errors
| gcloud auth login
gcloud auth application-default login
|
-
Permission Denied
- Check service account roles
-
Verify IAM permissions
-
Resource Quota
- Request quota increase
-
Use smaller machine types
-
Image Not Found
- Push image to registry
- Check image URI format
Next Steps