Container-Aware Pipelines & Dependency Management
This guide explains how to wire Python dependencies, Docker images, and stack resources together so every pipeline step (or execution group of steps) runs inside the right environment. It also includes a full Keras example covering datasets, models, callbacks, versioning, and experiment tracking.
1. Declare Dependencies Close to Your Code
Use pyproject.toml (Poetry) to describe every library the pipeline and its steps require. Keep training-only packages in a dedicated group/extra so the runtime image installs only what it needs:
Tip:
poetry export -f requirements.txt --with training -o build/requirements.txtgives you a frozen list when you preferrequirements.txt.
2. Connect Dependencies to the Stack Configuration
Reference the Dockerfile/Poetry/requirements entrypoint inside flowyml.yaml. The orchestrator reads this section whenever it builds or validates container images for a stack.
flowyml automatically infers a Dockerfile (Dockerfile, docker/Dockerfile, .docker/Dockerfile) and Poetry usage if you omit the docker block.
3. Provide Pipeline-Level Docker & Resource Overrides
For special workloads, override the stack defaults directly from code using DockerConfig and ResourceConfig. You can scope resources to entire execution groups by assigning the same execution_group to each @step.
Because the stack knows its container_registry, remote orchestrators such as Vertex AI receive the final image_uri and the aggregated resource envelope for each grouped step.
4. Build and Push Runtime Images
flowyml does not own your registry credentials; push images with Docker/Buildx and let stacks reference them:
When the registry block is present in flowyml.yaml, running flowyml stack apply prod-gcp prints the exact image URI that remote jobs must use. If you need multi-arch images, plug Buildx and push onceβflowyml only cares about the resulting image reference.
5. Full Keras Pipeline Example
Project Structure
Dockerfile (docker/training.Dockerfile)
Pipeline Code (pipelines/fraud_training.py)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
What This Example Demonstrates
- Dependencies come from Poetry (
--with training) so the Docker image installs TensorFlow/Keras without bloating lightweight services. - Docker Integration references a concrete Dockerfile, builds locally, and pushes to the stack registry so remote orchestrators run an identical environment.
- Resource Groups use
execution_groupto co-locate steps that must share the same GPU machine, while lighter steps remain isolated. - Assets & Versioning (
Dataset.create,Model.create,Metrics.create) tag every artifact with semantic versions so downstream pipelines can pin specific dataset/model combinations. - Experiment Tracking leverages
ExperimentplusFlowymlKerasCallbackto log training curves and final metrics that appear in the UI and CLI (flowyml experiment list).
With these pieces in place, any pipeline can describe its dependency tree, reference the correct Dockerfile or Poetry group, and ship container images to the chosen registry so stacks and resources always know which runtime to boot.