AI Checker Hub

Monitoring Methodology

This page explains how status signals are produced so readers can judge fitness for their own workloads.

Check Cadence and Scope

Status Classification

Some provider endpoints require authentication by design. For this public monitor, authentication responses are treated as a reachability signal rather than full workload health validation.

Uptime Calculation

24-hour uptime is computed from the ratio of operational samples to total samples over the last 24 hours for each provider.

This is a coarse public signal. It does not represent every region, model route, account tier, or quota state.

Known Limitations

How We Build Composite Status

Public status is not taken from a single raw probe. We combine provider-level check outcomes, endpoint-level telemetry, and recent error patterns to produce a state that users can act on quickly. The intent is to reduce both false calm ("everything is fine") and false panic ("global outage") during noisy periods.

Latency Treatment and Why p95 Matters

We publish p50 and p95 to represent both typical and tail user experience. p50 can remain stable while p95 climbs sharply under congestion, which is often when users perceive an outage even if some requests still pass.

For incident interpretation, p95 movement is weighted more heavily than p50 drift. This mirrors production behavior in interactive applications where tail latency drives timeout rates, retry storms, and user drop-off.

Incident Lifecycle Rules

Incident entries are derived from transitions in check state over time. An incident starts when monitored status leaves operational mode and ends when stability is restored for consecutive windows. Active incidents are shown before resolved incidents to prioritize operational response.

Data Quality Safeguards

We use defensive checks to avoid showing broken or misleading output when live data is temporarily unavailable. Pages fall back to cached snapshots with clear labeling so users know they are not viewing the latest stream.

We also cap malformed or missing values, validate payload shapes at runtime, and keep client-side rendering lightweight to reduce visual instability and loading regressions.