AI Checker Hub
Provider Reliability Comparison
Compare provider reliability with transparent definitions across uptime, latency, error rate, cost per 1M tokens, and effective rate limits. Use filters for provider, model family, and region.
Visualizations
Uptime (%)
Latency p95 (ms)
How To Choose A Provider Using This Page
Match your decision to workload type. Interactive products should prioritize p95 latency and timeout risk in user-facing regions. Batch pipelines should prioritize sustained throughput and cost stability under load. Critical production should maintain a two-provider strategy with staged failover and circuit breakers.
What Uptime Means Here
Uptime is calculated as successful checks divided by total checks over the selected window. It is a baseline health signal, not a guarantee that every model family or region performs equally at all times.
Why p95 Latency Matters More Than Averages
Averages can hide user pain during degradation. p95 better reflects tail behavior during incident windows and is generally more useful for routing thresholds and SLO protection.
FAQ
Why can providers look similar globally but differ in EU?
Regional routing, edge capacity, and policy differences can create major local variance despite similar global aggregates.
Does operational status mean performance is good?
Not always. A service can be operational while p95 latency and timeout risk are still elevated.
How often is this data updated?
This comparison view is updated from monitor snapshots and reflects rolling windows, not single-request outcomes.
Do different models from the same provider have different reliability?
Yes. Model family behavior can differ significantly, especially during capacity pressure periods.
How should I set failover thresholds using p95?
Use thresholds tied to user impact and trigger failover after consecutive breaches rather than one-time spikes.
What retry policy is safest during incidents?
Use bounded retries with jitter, short retry budgets, and a circuit breaker to avoid retry storms.
How To Read the Metrics Without Misleading Yourself
A single "best provider" rarely exists. Reliability depends on workload type, geography, and failure tolerance. Use the table to eliminate weak options first, then test finalists with your own traffic patterns.
Metric Priorities by Use Case
- Realtime chat and assistants: optimize for p95 latency and timeout stability.
- Batch generation jobs: optimize for sustained rate limits and cost predictability.
- Embeddings/search pipelines: prioritize steady throughput and low 5xx variance.
- Mission-critical operations: prioritize multi-provider resilience over marginal cost wins.
When to Trust 30-Day vs 90-Day Views
Use 30-day windows for current routing choices and capacity posture. Use 90-day windows to detect recurring structural risk, seasonality, or regional instability that short windows may hide.
Practical Routing Threshold Examples
Thresholds should be tied to user impact and not copied from generic templates. The examples below are starting points you can tune with your own SLO targets.
- Latency breach: if p95 exceeds target by 25% for 3 consecutive checks, shift 20% traffic to backup.
- Error breach: if 5xx exceeds 1.5% for 2 windows, cap retries and trigger selective failover.
- Rate-limit pressure: if 429 climbs above baseline, queue non-critical workloads before switching provider.
- Recovery: return traffic gradually after 2 clean windows to avoid oscillation.
Combine these with your incident archive and provider-specific pages to avoid overreacting to short-lived spikes.
Common Comparison Mistakes to Avoid
- Using one metric as truth: uptime alone misses slow responses and degraded quality windows.
- Ignoring region fit: a strong global score can still underperform in your primary customer region.
- Copying generic thresholds: route decisions should match your SLO and workload behavior.
- Over-rotating on short spikes: use consecutive checks and confidence windows before switching traffic.
- No cost guardrails: failover plans should include budget caps to prevent surprise spend.
The best teams treat this page as a decision support layer. They combine monitor data, business goals, and customer impact to choose routing strategies that are both resilient and cost-aware.
For best results, review this page together with your own app metrics at least weekly, then update thresholds gradually instead of making abrupt policy shifts after one noisy day.