Provider Reliability Comparison

Compare provider reliability with transparent definitions across uptime, latency, error rate, cost per 1M tokens, and effective rate limits. Use filters for provider, model family, and region.

Time Window

Provider

Model Family

Region

Comparison Table

Visualizations

Uptime (%)

Latency p95 (ms)

Recommended Defaults

Interactive apps: prioritize latency + fallback path; start with top 2 providers by p95 in your region.

Batch pipelines: prioritize cost and sustained rate limits, not raw latency.

Critical production: use cross-provider failover with capped retries and circuit breakers.

Methodology

Data source: AI Checker Hub synthetic checks and derived aggregates.
Uptime: successful/total checks over selected window.
Error rate: proportion of 5xx and transport failures.
Latency: p95 across successful checks.
Cost values are indicative and should be validated against vendor pricing pages.
Caveat: global aggregates can hide region-specific degradation patterns.

How To Choose A Provider Using This Page

Match your decision to workload type. Interactive products should prioritize p95 latency and timeout risk in user-facing regions. Batch pipelines should prioritize sustained throughput and cost stability under load. Critical production should maintain a two-provider strategy with staged failover and circuit breakers.

What Uptime Means Here

Uptime is calculated as successful checks divided by total checks over the selected window. It is a baseline health signal, not a guarantee that every model family or region performs equally at all times.

Why p95 Latency Matters More Than Averages

Averages can hide user pain during degradation. p95 better reflects tail behavior during incident windows and is generally more useful for routing thresholds and SLO protection.

FAQ

Why can providers look similar globally but differ in EU?

Regional routing, edge capacity, and policy differences can create major local variance despite similar global aggregates.

Does operational status mean performance is good?

Not always. A service can be operational while p95 latency and timeout risk are still elevated.

How often is this data updated?

This comparison view is updated from monitor snapshots and reflects rolling windows, not single-request outcomes.

Do different models from the same provider have different reliability?

Yes. Model family behavior can differ significantly, especially during capacity pressure periods.

How should I set failover thresholds using p95?

Use thresholds tied to user impact and trigger failover after consecutive breaches rather than one-time spikes.

What retry policy is safest during incidents?

Use bounded retries with jitter, short retry budgets, and a circuit breaker to avoid retry storms.

How To Read the Metrics Without Misleading Yourself

A single "best provider" rarely exists. Reliability depends on workload type, geography, and failure tolerance. Use the table to eliminate weak options first, then test finalists with your own traffic patterns.

Metric Priorities by Use Case

Realtime chat and assistants: optimize for p95 latency and timeout stability.
Batch generation jobs: optimize for sustained rate limits and cost predictability.
Embeddings/search pipelines: prioritize steady throughput and low 5xx variance.
Mission-critical operations: prioritize multi-provider resilience over marginal cost wins.

When to Trust 30-Day vs 90-Day Views

Use 30-day windows for current routing choices and capacity posture. Use 90-day windows to detect recurring structural risk, seasonality, or regional instability that short windows may hide.

Practical Routing Threshold Examples

Thresholds should be tied to user impact and not copied from generic templates. The examples below are starting points you can tune with your own SLO targets.

Latency breach: if p95 exceeds target by 25% for 3 consecutive checks, shift 20% traffic to backup.
Error breach: if 5xx exceeds 1.5% for 2 windows, cap retries and trigger selective failover.
Rate-limit pressure: if 429 climbs above baseline, queue non-critical workloads before switching provider.
Recovery: return traffic gradually after 2 clean windows to avoid oscillation.

Combine these with your incident archive and provider-specific pages to avoid overreacting to short-lived spikes.

Common Comparison Mistakes to Avoid

Using one metric as truth: uptime alone misses slow responses and degraded quality windows.
Ignoring region fit: a strong global score can still underperform in your primary customer region.
Copying generic thresholds: route decisions should match your SLO and workload behavior.
Over-rotating on short spikes: use consecutive checks and confidence windows before switching traffic.
No cost guardrails: failover plans should include budget caps to prevent surprise spend.

The best teams treat this page as a decision support layer. They combine monitor data, business goals, and customer impact to choose routing strategies that are both resilient and cost-aware.

For best results, review this page together with your own app metrics at least weekly, then update thresholds gradually instead of making abrupt policy shifts after one noisy day.