Operational
Endpoints are reachable at network level, including expected auth responses for protected endpoints.
This dashboard publishes live provider status snapshots, response latency, and rolling uptime trends. It is designed for teams that need a quick, public signal before routing workload to a model provider.
Latest known status from our monitoring pipeline.
Quick symbol-based map of the major model platforms covered in this reliability hub.
Endpoints are reachable at network level, including expected auth responses for protected endpoints.
Service is reachable but shows signs like throttling or unusual delay patterns that can affect production throughput.
Requests fail due to transport errors, repeated server failures, or endpoint unavailability during health checks.
No current check data is available yet, usually right after a fresh deployment or before first scheduled run.
Reliability varies by region, request shape, and provider-side policy controls. These numbers are intended as an external baseline, not a replacement for internal synthetic tests in your own deployment region. For full details on endpoint selection, status classification, and data caveats, review the Methodology page.
These pages are written for production troubleshooting and are updated as monitoring behavior changes. If you are seeing API failures now, start with the issue-specific guides below before changing routing policies.
Diagnose rate-limit failures and implement safe retry/backoff strategy without retry storms.
Troubleshoot slow responses, tune timeout budgets, and harden fallback paths for latency spikes.
Independent Anthropic uptime and latency snapshot with actionable reliability interpretation.
Independent Gemini uptime, latency trend, and recent incident windows.
These pages are the newest additions to the site and are designed to capture current search demand around outages, platform changes, and operational planning.
Independent uptime and latency context for Mistral API operations.
Independent Cohere status monitoring with incident windows and trend context.
Fast outage diagnosis for Mistral API with live monitor-backed signals.
Fast outage diagnosis for Cohere API with immediate mitigation guidance.
Current guidance on TTL choice, pricing multipliers, and cache breakpoint mistakes.
When long-running Responses API jobs should move to async background execution.
Timeline, impact, root cause, mitigation, and preventive actions from recent disruption windows.
Compare uptime, latency, error rate, cost, and effective limits with filterable 30/90-day views.
Production strategy guide with routing pseudocode, checklists, and failure-mode handling.
Historical incident summaries and rolling uptime trends for planning and postmortems.
Filterable 24h/7d/30d reliability view with provider-specific incident windows and trend context.
Estimate monthly spend, failed-request exposure, and fallback overhead before incidents happen.
Analyze text with transparent AI-likelihood indicators and confidence scoring.
Troubleshoot common API errors with root causes, severity, and concrete remediation steps.
Step-by-step production guide for rate-limit pressure and quota-driven error spikes.
Root-cause and mitigation patterns for timeout-heavy AI API workloads.
Independent OpenAI uptime, latency, incidents, endpoint status, and region-level health signals.
Independent Anthropic status, uptime trend, and practical operations notes.
Fast outage diagnosis for Anthropic API with live monitor signals.
Fast outage diagnosis with live checks and practical next-step actions for engineering teams.