Projects & Multi-Tenancy π’
Organize pipelines, runs, and artifacts into isolated projects for multi-tenant deployments.
Overview βΉοΈ
The Project and ProjectManager classes provide:
- Isolation: Each project has its own metadata store and artifact storage
- Organization: Group related pipelines and runs together
- Multi-tenancy: Support multiple teams/clients in one deployment
- Resource management: Track pipelines, runs, and artifacts per project
Quick Start π
| from flowyml import Project
# Create a project
project = Project("recommendation_system")
# Create pipelines within the project
pipeline = project.create_pipeline("training_v1")
# Add steps and run
pipeline.add_step(load_data)
pipeline.add_step(train_model)
result = pipeline.run()
# Get project statistics
stats = project.get_stats()
print(f"Total runs: {stats['total_runs']}")
print(f"Total artifacts: {stats['total_artifacts']}")
|
Project Management ποΈ
Creating Projects ποΈ
| from flowyml import ProjectManager
manager = ProjectManager()
# Create a new project
project = manager.create_project(
"ml_platform",
description="Main ML platform for product recommendations"
)
# List all projects
projects = manager.list_projects()
for proj in projects:
print(f"- {proj['name']}: {proj['description']}")
# Get existing project
project = manager.get_project("ml_platform")
|
Project Structure π
Each project has its own directory structure:
| projects/
βββ my_project/
βββ project.json # Project metadata
βββ runs/ # Pipeline run data
βββ artifacts/ # Stored artifacts
β βββ cache/ # Artifact cache
βββ metadata.db # SQLite metadata store
|
Working with Projects π οΈ
Creating Pipelines ποΈ
You can create pipelines in projects in two ways:
Method 1: Using project.create_pipeline()
| project = Project("analytics")
# Create multiple pipelines
etl_pipeline = project.create_pipeline("daily_etl")
reporting_pipeline = project.create_pipeline("weekly_reports")
ml_pipeline = project.create_pipeline("model_training")
# Each pipeline uses the project's metadata store
etl_pipeline.add_step(extract_data)
etl_pipeline.run()
|
Method 2: Using project_name parameter (Recommended)
| from flowyml import Pipeline, step
# Automatically creates/attaches to project if it doesn't exist
pipeline = Pipeline("daily_etl", project_name="analytics")
@step(outputs=["data"])
def extract_data():
return fetch_data()
pipeline.add_step(extract_data)
pipeline.run()
|
The project_name parameter automatically:
- Creates the project if it doesn't exist
- Attaches the pipeline to the project
- Uses the project's metadata store and runs directory
Querying Project Data π
| # List all runs in the project
runs = project.list_runs()
for run in runs:
print(f"{run['pipeline_name']}: {run['status']}")
# Filter runs by pipeline
training_runs = project.list_runs(pipeline_name="model_training")
# Get artifacts
artifacts = project.get_artifacts()
for artifact in artifacts:
print(f"{artifact['name']}: {artifact['type']}")
# Filter by artifact type
models = project.get_artifacts(artifact_type="model")
|
Project Statistics π
| stats = project.get_stats()
print(f"""
Project: {project.name}
Created: {stats['created_at']}
Pipelines: {stats['total_pipelines']}
Runs: {stats['total_runs']}
Artifacts: {stats['total_artifacts']}
""")
|
Multi-Tenant Architecture ποΈ
FlowyML is built for scale. You can isolate different teams, clients, or environments into dedicated projects.
How Isolation Works π‘οΈ
Every project has its own isolated resources:
| Project A (Team Growth) Project B (Team Finance)
βββ Metadata (SQL) βββ Metadata (SQL)
βββ Artifacts (S3 Bucket A) βββ Artifacts (S3 Bucket B)
βββ Runs (isolated folder) βββ Runs (isolated folder)
|
Key Benefits:
- π Security: Team Growth cannot see Team Finance's data or models.
- β‘ Performance: Metadata queries are scoped to the project, keeping them fast.
- π Accounting: Easily track cloud storage costs per project/team.
| # Setup for multiple clients
clients = ["acme_corp", "tech_startup", "enterprise_inc"]
for client in clients:
# Each client gets their own project
project = manager.create_project(
client,
description=f"ML pipelines for {client}"
)
# Create client-specific pipelines
pipeline = project.create_pipeline("recommendation_engine")
pipeline.add_step(load_client_data) # Client-specific data
pipeline.add_step(train_model)
# Run in isolation
result = pipeline.run()
|
Resource Tracking πΈ
| def get_client_usage(client_name):
project = manager.get_project(client_name)
stats = project.get_stats()
return {
"client": client_name,
"pipelines": stats['total_pipelines'],
"runs": stats['total_runs'],
"artifacts": stats['total_artifacts'],
"storage_usage_mb": project.get_storage_usage()
}
# Generate usage report
for client in clients:
usage = get_client_usage(client)
print(f"{client}: {usage['runs']} runs, {usage['storage_usage_mb']}MB")
|
Best Practices π‘
1. Project Naming π·οΈ
| # Use descriptive, hierarchical names
project = Project("company_product_ml")
# Or organize by team/domain
project = Project("data_team_recommendations")
|
2. Pipeline Organization π
| project = Project("sales_analytics")
# Group related pipelines
project.create_pipeline("etl_daily")
project.create_pipeline("etl_weekly")
project.create_pipeline("reporting_dashboard")
project.create_pipeline("forecasting_model")
|
3. Data Hygiene & Cleanup π§Ή
| # Export project before cleanup
project.export_metadata("backup.json")
# List pipelines for review
pipelines = project.get_pipelines()
# Remove if needed (use with caution!)
# manager.delete_project("old_project", confirm=True)
|
Integration Examples π
With Versioning π
| from flowyml import VersionedPipeline, Pipeline, context
ctx = context(learning_rate=0.001, epochs=10)
# Method 1: Using project_name parameter (Recommended)
# Automatically creates/attaches to project and creates VersionedPipeline
pipeline = Pipeline("training", context=ctx, version="v1.0.0", project_name="ml_prod")
pipeline.add_step(train)
pipeline.save_version()
pipeline.run()
# Method 2: Using VersionedPipeline directly with project_name
versioned = VersionedPipeline("training", context=ctx, version="v1.0.1", project_name="ml_prod")
versioned.add_step(train)
versioned.save_version()
versioned.run()
# Method 3: Using project.create_pipeline() then manually setting up versioning
project = Project("ml_prod")
pipeline = project.create_pipeline("training")
versioned = VersionedPipeline("training", context=ctx, version="v1.0.2")
versioned.runs_dir = project.runs_dir
versioned.metadata_store = project.metadata_store
versioned.add_step(train)
versioned.save_version()
versioned.run()
|
With Scheduling β°
| from flowyml import PipelineScheduler
project = Project("automated_ml")
scheduler = PipelineScheduler()
def run_project_pipeline():
pipeline = project.create_pipeline("daily_training")
pipeline.add_step(train_model)
return pipeline.run()
scheduler.schedule_daily(
name=f"{project.name}_daily_run",
pipeline_func=run_project_pipeline,
hour=2
)
|
API Reference π
Project ποΈ
Constructor:
| Project(
name: str,
description: str = "",
projects_dir: str = ".flowyml/projects"
)
|
Methods:
- create_pipeline(name: str, **kwargs) -> Pipeline - Create pipeline in project
- get_pipelines() -> List[str] - List all pipeline names
- list_runs(pipeline_name: Optional[str] = None, limit: int = 100) -> List[Dict]
- get_artifacts(artifact_type: Optional[str] = None, limit: int = 100) -> List[Dict]
- get_stats() -> Dict - Get project statistics
- export_metadata(path: str) - Export project metadata
ProjectManager π οΈ
Methods:
- create_project(name: str, description: str = "") -> Project - Create new project
- get_project(name: str) -> Optional[Project] - Get existing project
- list_projects() -> List[Dict] - List all projects
- delete_project(name: str, confirm: bool = False) - Delete project
FAQ β
Q: Can I move a pipeline from one project to another?
A: Currently, pipelines are tied to their project's metadata store. You would need to export/import the pipeline definition manually.
Q: How do I backup a project?
A: Use project.export_metadata() and copy the entire project directory from .flowyml/projects/{project_name}/.
Q: What happens when I delete a project?
A: All pipelines, runs, and artifacts associated with the project are removed. Always export metadata first!
Q: Can projects share artifacts?
A: No, projects are fully isolated by design. This ensures multi-tenant security and resource tracking.