πΊοΈ Map Tasks
What you'll learn
How to distribute work across collections with automatic parallelism, per-item retries, and partial failure tolerance β inspired by Flyte's array-node handler.
Map tasks let you write a function for a single item and have FlowyML automatically execute it over an entire collection in parallel with configurable concurrency, retry behaviour, and failure tolerance.
Why Map Tasks?
| Without Map Tasks | With @map_task |
|---|---|
Manual for loops |
Automatic parallelism |
| No retry per item | Per-item retries with exponential backoff |
| One failure kills everything | Partial failure tolerance |
| No progress visibility | Result inspection with success ratios |
Basic Usage
When process_record receives a list of records at runtime, FlowyML:
- Distributes items across a thread pool (up to
concurrencyworkers) - Retries each failed item up to
retriestimes with exponential backoff - Returns a
MapTaskResultcontaining successes, failures, and error details - Raises
RuntimeErroronly if the success ratio falls belowmin_success_ratio
Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
concurrency |
int |
4 |
Maximum parallel workers |
retries |
int |
0 |
Per-item retry count |
retry_delay |
float |
1.0 |
Base delay between retries (seconds, exponential) |
min_success_ratio |
float |
1.0 |
Minimum ratio of items that must succeed |
fail_fast |
bool |
False |
Cancel remaining items on first failure |
timeout_per_item |
int \| None |
None |
Optional per-item timeout (seconds) |
name |
str \| None |
Function name | Custom step name |
Real-World Examples
Batch Data Processing
Image Processing with Failure Tolerance
Fail-Fast Mode
Result Inspection
Every map task returns a MapTaskResult dataclass:
MapTaskResult API
| Attribute | Type | Description |
|---|---|---|
results |
list[Any] |
All results (None for failed items) |
successes |
int |
Count of successful items |
failures |
int |
Count of failed items |
errors |
dict[int, str] |
Index β error message for failures |
total |
int |
Total items processed |
success_ratio |
float |
Property: successes / total |
successful_results |
list[Any] |
Property: only non-None results |
How It Works Under the Hood
graph LR
A["Input Collection<br/>[itemβ, itemβ, ..., itemβ]"] --> B["ThreadPoolExecutor<br/>(concurrency workers)"]
B --> C1["Worker 1<br/>itemβ β resultβ"]
B --> C2["Worker 2<br/>itemβ β resultβ"]
B --> C3["Worker N<br/>itemβ β resultβ"]
C1 --> D["MapTaskResult"]
C2 --> D
C3 --> D
D --> E{"success_ratio β₯<br/>min_success_ratio?"}
E -- Yes --> F["β
Return results"]
E -- No --> G["β RuntimeError"]
Best Practices
Choosing concurrency
- I/O-bound tasks (API calls, file reads): Use higher concurrency (8β32)
- CPU-bound tasks (data transformations): Match CPU core count (4β8)
- GPU tasks: Keep at 1β2 to avoid memory pressure
Retry strategy
Set retries=2 with retry_delay=1.0 for network calls. The backoff is exponential: delays are 1s β 2s β 4s.
Thread safety
Map tasks use ThreadPoolExecutor. Ensure your function is thread-safe β avoid shared mutable state.