β FAQ
β Frequently Asked Questions
Quick answers to the most common questions about FlowyML.
π€ Common Questions π‘ Quick Answers π Deep Links
Getting Started
What is FlowyML and how is it different from Airflow or Prefect?
FlowyML is an artifact-centric ML pipeline framework. Unlike Airflow and Prefect which are task-based (you wire steps together with arrows), FlowyML steps declare what data they produce and consume. The execution graph builds itself automatically.
This means zero manual DAG wiring, automatic data lineage, and type-safe connections between steps.
What Python version does FlowyML require?
FlowyML requires Python 3.10 or higher. We recommend using the latest stable Python release for the best performance.
Can I use FlowyML with my existing MLflow or Weights & Biases setup?
Yes! FlowyML integrates natively with both MLflow and W&B through its plugin system. You can log experiments, track metrics, and manage models using your existing infrastructure.
Architecture & Design
What does 'artifact-centric' actually mean?
In FlowyML, artifacts are first-class citizens. Instead of defining execution order manually, you define:
- What each step outputs (e.g., a
Model,Dataset, orMetrics) - What each step inputs (consumes from other steps)
FlowyML automatically resolves dependencies and builds the DAG. This means you never write step_a >> step_b arrows.
How does FlowyML's caching work?
FlowyML uses content-based hashing to determine if a step needs re-execution. It computes a hash from:
- The step's source code
- Input artifact content hashes
- Step configuration parameters
If the hash matches a previous run, the step is skipped and cached results are used. This is more reliable than file-timestamp caching.
What's the relationship between FlowyML and FlowyML Notebook?
FlowyML is the pipeline framework β it runs production ML workflows.
FlowyML Notebook is a companion reactive notebook environment (replacing Jupyter) designed for ML experimentation. Notebooks can be promoted to FlowyML pipelines with one click.
They work together but are independent packages:
Deployment & Production
How do I deploy FlowyML to production?
FlowyML supports three deployment tiers:
- Local β Default. Run with
python pipeline.py - Docker Compose β Containerized with
docker-compose up - Cloud β GCP Vertex AI, AWS SageMaker, or Azure ML
Switch between environments with a single config change:
Does FlowyML support GPU workloads?
Yes. Steps can declare resource requirements including GPU:
@step(outputs=["model"], resources={"gpu": 1, "memory": "16Gi"})
def train_model(dataset: list) -> Model:
# GPU-accelerated training
...
When using cloud orchestrators (Vertex AI, SageMaker), GPU resources are automatically provisioned.
Can I use FlowyML in CI/CD pipelines?
Absolutely. FlowyML is designed for CI/CD integration:
- Evaluation gates: Use
EvalAssertto fail builds when model quality degrades - Dry-run mode: Validate pipeline structure without execution
- Scheduling: Set up recurring pipeline runs with cron expressions
Data & Storage
How do I handle large datasets?
FlowyML handles large datasets through:
- Streaming materializers β Process data in chunks without loading everything into memory
- Content-hash caching β Large datasets are only transferred once; subsequent runs use cached versions
- Cloud artifact stores β Store datasets in GCS, S3, or Azure Blob Storage
- Map tasks β Distribute processing across parallel workers
What storage backends does FlowyML support?
| Backend | Type | Use Case |
|---|---|---|
| Local filesystem | Artifact Store | Development |
| Google Cloud Storage | Artifact Store | Production (GCP) |
| Amazon S3 | Artifact Store | Production (AWS) |
| Azure Blob Storage | Artifact Store | Production (Azure) |
| SQLite | Metadata Store | Development |
| PostgreSQL | Metadata Store | Production |
| MLflow | Experiment Tracker | Experiment logging |
| W&B | Experiment Tracker | Experiment logging |
Open Source
Is FlowyML open source?
Yes! FlowyML is fully open source under a permissive license. You can find the source code, contribute, and report issues on GitHub.
Still have questions?
Check the Glossary for terminology, or explore the Getting Started guide for a hands-on introduction.
π What's Next?
π Getting Started
Build your first pipeline in 5 minutes with the quick start tutorial.
π Glossary
Look up FlowyML-specific terms and concepts with linked references.
π€ Why FlowyML?
Detailed comparison with Airflow, Prefect, ZenML, and Metaflow.