Google Cloud Platform (GCP) ☁️

Scale your pipelines from local prototypes to production workloads on Google Cloud.

What you'll learn

How to run flowyml pipelines on Vertex AI and store data in GCS. Develop locally, then flip a switch to run on a 100-GPU cluster.

Why Use GCP with flowyml?

Local limitations: - Memory: "OOM Error" on large datasets - Compute: Training takes days on a CPU - Storage: Hard drive full of model checkpoints

GCP advantages: - Infinite Scale: Spin up as many machines as you need - Managed Services: Vertex AI handles the infrastructure - Unified Data: Store everything in GCS, accessible from anywhere

☁️ GCS Artifact Store

Store your pipeline artifacts (datasets, models) in Google Cloud Storage. This makes them accessible to your team and production systems.

Configuration

# Register a stack that uses GCS
flowyml stack register gcp-prod \
    --artifact-store gs://my-bucket/flowyml-artifacts \
    --metadata-store sqlite:///flowyml.db

🚀 Vertex AI Execution

Run your pipeline steps as Vertex AI Custom Jobs. flowyml handles the Dockerization and submission automatically.

Recommended: Stack-Based Execution (Automatic Orchestrator)

The best practice is to use a GCP stack, which automatically provides the orchestrator, executor, artifact store, and all other components. This is the core concept of stacks - they encapsulate all infrastructure configuration.

from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack

# Create GCP stack with Vertex AI orchestrator
gcp_stack = GCPStack(
    name="production",
    project_id="my-gcp-project",
    region="us-central1",
    bucket_name="my-artifacts-bucket",
    service_account="my-sa@my-project.iam.gserviceaccount.com"
)

# Create pipeline with stack
pipeline = Pipeline("training_pipeline", stack=gcp_stack)
# ... add steps ...

# Run pipeline - automatically uses Vertex AI orchestrator from stack!
pipeline.run()

Or use active stack from configuration:

from flowyml import Pipeline
from flowyml.stacks.registry import set_active_stack

# Set active stack (from flowyml.yaml or programmatically)
set_active_stack("gcp-production")

# Pipeline automatically uses active stack's orchestrator
pipeline = Pipeline("training_pipeline")
pipeline.add_step(...)
pipeline.run()  # Uses Vertex AI orchestrator from active stack!

Alternative: Explicit Orchestrator Override

If you need to override the stack's orchestrator for a specific run:

from flowyml import Pipeline
from flowyml.stacks.gcp import VertexAIOrchestrator

pipeline = Pipeline("training_pipeline")
# ... add steps ...

# Override orchestrator for this run only
pipeline.run(
    orchestrator=VertexAIOrchestrator(
        project_id="my-gcp-project",
        region="us-central1",
        service_account="my-sa@my-project.iam.gserviceaccount.com"
    )
)

Stack-Based is Better

Using a stack ensures all components (orchestrator, artifact store, metadata store, container registry) work together seamlessly.

Cost Control

Vertex AI charges by the second. flowyml ensures resources are only provisioned while your steps are running.