What 429 Actually Means
A 429 response usually means your request stream exceeded one of several policy limits: requests per minute, tokens per minute, concurrent in-flight calls, or account-tier quotas. Many teams only monitor request count, but token burst and concurrency are often the hidden bottlenecks in LLM workloads.
The key operational takeaway is that 429 is not always an outage signal. It can be fully local to your account usage pattern, workload spikes, or request batching logic. Treat 429 as capacity pressure first, and only escalate to incident mode after checking broader error patterns and independent monitor data.
