When to Trigger Fallback
- Timeouts: request exceeds user-SLO budget (for example, 2.5s for interactive chat).
- 5xx errors: provider-side errors exceed threshold for rolling window.
- 429/rate limit pressure: queue depth or throttle responses increase beyond retry budget.
- Degraded quality signal: output validation fails deterministic checks for safety or structure.
Use debounced thresholds (e.g., 3 failures in 60s) to avoid thrashing and unnecessary provider switches.
