π‘οΈ Error Handling & Retries
What you'll learn
How to make pipelines resilient to network blips, API timeouts, and transient errors. In distributed systems, failure is inevitable β your pipeline should handle it.
Build self-healing pipelines that recover from failures automatically using retries, circuit breakers, fallbacks, and failure handlers.
Why Robustness Matters π‘οΈ
| Without Error Handling | With FlowyML Resilience |
|---|---|
| One network timeout kills the whole job | Transient errors retried automatically |
| Waking up at 3 AM to click "retry" | Self-healing pipelines |
| Cascading failures spread across services | Circuit breakers stop the cascade |
| Partial failures leave data inconsistent | Fallbacks provide safe defaults |
Decision Guide βοΈ
| Pattern | Use When | Example |
|---|---|---|
| Retry | Transient errors: network blips, rate limits | API timeout, 503 error |
| Circuit Breaker | System outages: service is down hard | Database down, repeated 500s |
| Fallback | Critical path: must continue even if step fails | Use cached data if live API fails |
| Failure Handler | Alerting: notify team when things break | Slack ping on critical step failure |
π Retries with Exponential Backoff
Automatically retry failed steps with configurable backoff strategies.
Retry Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
max_attempts |
int |
3 |
Total attempts including the first |
backoff |
BackoffStrategy |
ExponentialBackoff() |
Delay strategy between retries |
on |
list[type] |
All exceptions | Exception types to retry on |
Backoff Strategies
π Circuit Breakers
Prevent cascading failures by "opening the circuit" when a service is down β failing fast instead of waiting for timeouts.
Circuit Breaker States
stateDiagram-v2
[*] --> Closed
Closed --> Open : failure_threshold exceeded
Open --> HalfOpen : timeout elapsed
HalfOpen --> Closed : success
HalfOpen --> Open : failure
| State | Behavior |
|---|---|
| Closed | Requests pass through normally |
| Open | Requests fail immediately (no attempt) |
| Half-Open | One test request allowed; success β Closed, failure β Open |
π‘οΈ Fallbacks
Define a fallback function to execute when a step fails, ensuring the pipeline can continue with safe defaults.
When to use fallbacks
Fallbacks are ideal for non-critical data sources where stale data is better than no data. Don't use fallbacks for steps where correctness is required.
π¨ Failure Handlers
Configure actions to take when a step fails β even if the pipeline continues via fallback:
Combining Patterns π§©
Use retries, circuit breakers, and fallbacks together for maximum resilience:
Best Practices π‘
Start simple
Add retry(max_attempts=3) to any step that calls an external API. This single addition prevents most transient failures.
Use circuit breakers for shared services
If multiple pipeline steps call the same service, a circuit breaker prevents all of them from hammering a failing service.
Don't retry non-idempotent operations
Only retry operations that are safe to repeat. Don't retry a POST that creates a database record β you'll get duplicates.