❓ FAQ

❓ Frequently Asked Questions

Quick answers to the most common questions about FlowyML.

🤔 Common Questions 💡 Quick Answers 🔗 Deep Links

Getting Started

What is FlowyML and how is it different from Airflow or Prefect?

FlowyML is an artifact-centric ML pipeline framework. Unlike Airflow and Prefect which are task-based (you wire steps together with arrows), FlowyML steps declare what data they produce and consume. The execution graph builds itself automatically.

This means zero manual DAG wiring, automatic data lineage, and type-safe connections between steps.

Full comparison →

What Python version does FlowyML require?

FlowyML requires Python 3.10 or higher. We recommend using the latest stable Python release for the best performance.

python --version  # Must be 3.10+
pip install flowyml

Installation guide →

Can I use FlowyML with my existing MLflow or Weights & Biases setup?

Yes! FlowyML integrates natively with both MLflow and W&B through its plugin system. You can log experiments, track metrics, and manage models using your existing infrastructure.

# Example: MLflow integration
from flowyml.plugins import MLflowTracker
pipeline.with_plugin(MLflowTracker(tracking_uri="http://localhost:5000"))

MLflow integration → · W&B integration →

Architecture & Design

What does 'artifact-centric' actually mean?

In FlowyML, artifacts are first-class citizens. Instead of defining execution order manually, you define:

What each step outputs (e.g., a Model, Dataset, or Metrics)
What each step inputs (consumes from other steps)

FlowyML automatically resolves dependencies and builds the DAG. This means you never write step_a >> step_b arrows.

Artifact-centric philosophy →

How does FlowyML's caching work?

FlowyML uses content-based hashing to determine if a step needs re-execution. It computes a hash from:

The step's source code
Input artifact content hashes
Step configuration parameters

If the hash matches a previous run, the step is skipped and cached results are used. This is more reliable than file-timestamp caching.

Caching guide →

What's the relationship between FlowyML and FlowyML Notebook?

FlowyML is the pipeline framework — it runs production ML workflows.

FlowyML Notebook is a companion reactive notebook environment (replacing Jupyter) designed for ML experimentation. Notebooks can be promoted to FlowyML pipelines with one click.

They work together but are independent packages:

pip install flowyml           # The pipeline framework
pip install flowyml-notebook   # The reactive notebook

FlowyML Notebook →

Deployment & Production

How do I deploy FlowyML to production?

FlowyML supports three deployment tiers:

Local — Default. Run with python pipeline.py
Docker Compose — Containerized with docker-compose up
Cloud — GCP Vertex AI, AWS SageMaker, or Azure ML

Switch between environments with a single config change:

export FLOWYML_STACK=production
python pipeline.py  # Now runs on cloud infrastructure

Deployment guide → · Production guide →

Does FlowyML support GPU workloads?

Yes. Steps can declare resource requirements including GPU:

@step(outputs=["model"], resources={"gpu": 1, "memory": "16Gi"})
def train_model(dataset: list) -> Model:
    # GPU-accelerated training
    ...

When using cloud orchestrators (Vertex AI, SageMaker), GPU resources are automatically provisioned.

Resource requirements →

Can I use FlowyML in CI/CD pipelines?

Absolutely. FlowyML is designed for CI/CD integration:

Evaluation gates: Use EvalAssert to fail builds when model quality degrades
Dry-run mode: Validate pipeline structure without execution
Scheduling: Set up recurring pipeline runs with cron expressions

# GitHub Actions example
- name: Run pipeline
  run: flowyml run --stack ci --dry-run

CI/CD evaluation →

Data & Storage

How do I handle large datasets?

FlowyML handles large datasets through:

Streaming materializers — Process data in chunks without loading everything into memory
Content-hash caching — Large datasets are only transferred once; subsequent runs use cached versions
Cloud artifact stores — Store datasets in GCS, S3, or Azure Blob Storage
Map tasks — Distribute processing across parallel workers

Map tasks → · Materializers →

What storage backends does FlowyML support?

Backend	Type	Use Case
Local filesystem	Artifact Store	Development
Google Cloud Storage	Artifact Store	Production (GCP)
Amazon S3	Artifact Store	Production (AWS)
Azure Blob Storage	Artifact Store	Production (Azure)
SQLite	Metadata Store	Development
PostgreSQL	Metadata Store	Production
MLflow	Experiment Tracker	Experiment logging
W&B	Experiment Tracker	Experiment logging

Plugins overview →

Open Source

Is FlowyML open source?

Yes! FlowyML is fully open source under a permissive license. You can find the source code, contribute, and report issues on GitHub.

GitHub → · Contributing →

Still have questions?

Check the Glossary for terminology, or explore the Getting Started guide for a hands-on introduction.

🚀 What's Next?

🚀 Getting Started

Build your first pipeline in 5 minutes with the quick start tutorial.

Start Building →

📖 Glossary

Look up FlowyML-specific terms and concepts with linked references.

Browse Glossary →

🤔 Why FlowyML?

Detailed comparison with Airflow, Prefect, ZenML, and Metaflow.

See Comparison →