βοΈ Google Cloud Platform Integration
Deploy FlowyML pipelines on GCP with Vertex AI, GCS artifact storage, and Cloud Run serving.
π€ Vertex AI βοΈ GCS π Cloud Run
Google Cloud Platform (GCP) βοΈ
Scale your pipelines from local prototypes to production workloads on Google Cloud.
What you'll learn
How to run flowyml pipelines on Vertex AI and store data in GCS. Develop locally, then flip a switch to run on a 100-GPU cluster.
Why Use GCP with flowyml?
Local limitations: - Memory: "OOM Error" on large datasets - Compute: Training takes days on a CPU - Storage: Hard drive full of model checkpoints
GCP advantages: - Infinite Scale: Spin up as many machines as you need - Managed Services: Vertex AI handles the infrastructure - Unified Data: Store everything in GCS, accessible from anywhere
βοΈ GCS Artifact Store
Store your pipeline artifacts (datasets, models) in Google Cloud Storage. This makes them accessible to your team and production systems.
Configuration
# Register a stack that uses GCS
flowyml stack register gcp-prod \
--artifact-store gs://my-bucket/flowyml-artifacts \
--metadata-store sqlite:///flowyml.db
π Vertex AI Execution
Run your pipeline steps as Vertex AI Custom Jobs. flowyml handles the Dockerization and submission automatically.
Recommended: Stack-Based Execution (Automatic Orchestrator)
The best practice is to use a GCP stack, which automatically provides the orchestrator, executor, artifact store, and all other components. This is the core concept of stacks - they encapsulate all infrastructure configuration.
from flowyml import Pipeline
from flowyml.stacks.gcp import GCPStack
# Create GCP stack with Vertex AI orchestrator
gcp_stack = GCPStack(
name="production",
project_id="my-gcp-project",
region="us-central1",
bucket_name="my-artifacts-bucket",
service_account="my-sa@my-project.iam.gserviceaccount.com"
)
# Create pipeline with stack
pipeline = Pipeline("training_pipeline", stack=gcp_stack)
# ... add steps ...
# Run pipeline - automatically uses Vertex AI orchestrator from stack!
pipeline.run()
Or use active stack from configuration:
from flowyml import Pipeline
from flowyml.stacks.registry import set_active_stack
# Set active stack (from flowyml.yaml or programmatically)
set_active_stack("gcp-production")
# Pipeline automatically uses active stack's orchestrator
pipeline = Pipeline("training_pipeline")
pipeline.add_step(...)
pipeline.run() # Uses Vertex AI orchestrator from active stack!
Alternative: Explicit Orchestrator Override
If you need to override the stack's orchestrator for a specific run:
from flowyml import Pipeline
from flowyml.stacks.gcp import VertexAIOrchestrator
pipeline = Pipeline("training_pipeline")
# ... add steps ...
# Override orchestrator for this run only
pipeline.run(
orchestrator=VertexAIOrchestrator(
project_id="my-gcp-project",
region="us-central1",
service_account="my-sa@my-project.iam.gserviceaccount.com"
)
)
Stack-Based is Better
Using a stack ensures all components (orchestrator, artifact store, metadata store, container registry) work together seamlessly.
Cost Control
Vertex AI charges by the second. flowyml ensures resources are only provisioned while your steps are running.
π What's Next?
βοΈ AWS Integration
Run FlowyML pipelines on AWS with SageMaker orchestration and S3 artifact storage.
βοΈ Azure Integration
Deploy on Azure with Azure ML, Blob Storage, and AKS orchestration.