API Throttling Detected (429)

What this means

When API Gateway returns a 429 Too Many Requests response, it means your API has hit one of several rate limiting boundaries. API Gateway enforces throttling at multiple layers: the account-level limit (10,000 requests per second by default across all APIs in a region), the stage-level default throttle, per-method throttle overrides, and usage plan limits tied to API keys. A 429 response tells the caller to slow down, but it also means legitimate traffic is being dropped right now.

Throttling in API Gateway uses a token bucket algorithm. Each throttle setting has two components: a steady-state rate (requests per second) and a burst capacity (the maximum number of concurrent requests the bucket can hold). When a burst of traffic arrives, it drains the bucket; if the bucket empties before the steady-state rate can refill it, subsequent requests get throttled. This means you can see 429s even when your average request rate is well below the throttle limit -- a sudden spike is enough to trigger it.

Throttling hits synchronous API consumers hardest like web frontends or mobile apps, where a 429 translates directly into a failed user action. For internal microservice-to-microservice calls, throttling can cascade: if Service A is throttled calling Service B's API, Service A's own responses slow down or fail, which can then throttle Service C that depends on Service A. A single throttle boundary can degrade the entire request chain.

Detection criteria

HIGH

More than 5% of total requests are throttled (HTTP 429).

ELEVATED

Any requests are throttled (> 0 HTTP 429 responses detected).

Example finding

HIGH API Throttling Detected (429)

7.3% of API Gateway requests were throttled (438 of 6,000 requests returned HTTP 429). Throttling concentrated during 14:00-14:15 UTC window, suggesting a traffic burst exceeding burst capacity.

Recommendation: Increase API Gateway usage plan throttle/burst limits. Implement client-side exponential backoff. Consider distributing traffic more evenly across time windows.

How to fix

Identify which throttle layer is triggering

API Gateway has multiple throttle layers and you need to know which one is causing the 429s. Check your usage plan limits first -- if you are using API keys, the usage plan's rate and burst settings are often the tightest constraint. Then check stage-level throttle settings (in the Stage Editor under "Default Method Throttling"). Finally, check the account-level limit in Service Quotas. The CloudWatch metric Count filtered by status 429, broken down by resource and method, will tell you exactly which endpoints are throttled.

Increase usage plan and stage throttle limits

If the throttle limit is too low for your traffic, increase it. For usage plans, update the rate limit (requests per second) and burst limit in the API Gateway console or via the CLI: aws apigateway update-usage-plan --usage-plan-id <id> --patch-operations op=replace,path=/throttle/rateLimit,value=1000. For stage-level defaults, update them in the stage settings. If you are hitting the account-level 10,000 RPS limit, request a quota increase through the AWS Service Quotas console.

Implement client-side exponential backoff

Every API client should handle 429 responses gracefully. Implement exponential backoff with jitter: wait 1 second on the first retry, then 2 seconds, then 4 seconds, adding a random jitter of 0-1 seconds to each wait. Most AWS SDKs include this behavior automatically, but custom HTTP clients need explicit retry logic. Also respect the Retry-After header if your API returns one.

Smooth traffic spikes with queuing

If throttling happens during predictable traffic bursts (batch jobs, cron-triggered syncs, marketing campaign launches), decouple the burst from the API call. Place requests into an SQS queue and process them at a controlled rate using a Lambda consumer with reserved concurrency. This converts a spike of 10,000 simultaneous requests into a smooth 500 RPS stream that stays well within throttle limits.

Use caching to reduce request volume

Enable API Gateway caching for GET endpoints that return data which does not change on every request. A cache with a 60-second TTL can reduce backend invocations by 90% or more for high-traffic read endpoints, directly reducing the request volume that counts against your throttle limits. Configure cache key parameters carefully to avoid serving stale responses for personalized data.

Elevated Client Error Rate (4xx) — throttling contributes to the overall 4xx error rate
Degraded API Endpoint — throttled endpoints appear as degraded in per-endpoint analysis

See if your API is being throttled — paste a CloudWatch export into smplogs.

Try it free