AI Checker HubAI Checker Hub

Status History and Uptime Archive

Archive of significant reliability events and rolling monthly uptime snapshots. Use this page for post-incident review, provider trend comparisons, and planning fallback policy thresholds.

Notable Incident Archive

DateProvider ScopeDurationSummary
2026-02-18Multi-provider1h 55m5xx burst + timeout amplification during high-load period.
2026-01-282 providers48mRegional rate-limit spikes in EU inference endpoints.
2025-12-11Single provider2h 12mInference queue saturation and elevated latency.
2025-11-03Single provider37mAuthentication edge instability for API key validation.

Monthly Uptime Snapshot

ProviderLast 30dLast 90dTrend
OpenAI99.86%99.71%Stable
Anthropic99.79%99.63%Stable
Google AI99.41%99.18%Improving
Mistral99.74%99.52%Stable
Cohere99.57%99.34%Stable
Perplexity98.92%98.47%Volatile

How To Use Incident History To Tune Alerts

Historical incidents are most useful when they change operational thresholds. If recurring windows show p95 growth before 5xx spikes, move alerting earlier to tail latency and timeout growth rather than waiting for hard failures. This reduces user-visible impact and shortens mitigation time.

Archive data also helps calibrate escalation rules. Short, self-healing bursts should trigger local mitigation, while multi-interval sustained errors should trigger incident mode and fallback routing with explicit traffic caps.

Common Incident Patterns

Detailed Incident Writeups

For deeper operational analysis, review the full incident pages: Incident Analysis, OpenAI Status, Anthropic Status, Gemini Status.

How to Convert Archive Data Into Better Alert Policies

Archive review should directly improve your monitoring configuration. If the same event type appears repeatedly, your alert thresholds or escalation logic likely need adjustment. The goal is not to collect history, but to reduce future incident impact and response time.

Monthly Policy Update Checklist

Pattern-to-Action Mapping

FAQ

How far back should I keep incident history?

At least 90 days for routing policy decisions and 12 months for annual reliability planning.

Should historical uptime replace live monitoring?

No. Archive trends guide policy; live telemetry drives immediate response.

How often should I review this archive?

Weekly for active operations teams and monthly for threshold and runbook updates.

Can one incident justify changing provider strategy?

Usually no. Look for repeated patterns and cross-region impact before making major strategy changes.

What if official provider status disagrees with this archive?

Use both signals, then prioritize your own production impact and user-facing telemetry.

Archive Review Framework for Quarterly Planning

Quarterly reviews are where archive data creates long-term value. Instead of reading incidents one by one, group them by impact type, affected region, and recovery speed. This reveals whether your platform risk is shifting toward latency, authentication, or hard outages.

Quarterly Questions to Answer

Converting answers into explicit roadmap items helps prevent repeated incident patterns and improves operational maturity over time.

How to Prioritize Reliability Investments

Historical status data can guide where engineering time produces the biggest reliability gains. Start by estimating which incident pattern caused the highest user impact and operational cost. Then prioritize one investment at a time: better alerting, safer retries, stronger fallback coverage, or improved auth controls.

This method keeps roadmap choices evidence-based and reduces the chance of chasing low-impact reliability work.

Use the same scoring model each quarter so trend comparisons remain consistent and actionable.