High API Response Latency
P99 response latency causing client timeouts.
What this means
High API response latency means the total time from when API Gateway receives a request to when it sends back the response is consistently slow for a significant percentile of requests. The latency field in API Gateway access logs measures this end-to-end duration and includes API Gateway overhead (authentication, request validation, VTL mapping), integration latency (the time the backend takes to respond), and response mapping. When P99 latency exceeds several seconds, the slowest 1% of your users are experiencing painful delays -- and in practice, if your P99 is 10 seconds, many users at the P90 and P95 percentiles are also seeing multi-second waits.
The relationship between latency and user experience is nonlinear. Response times above 1 second break the user's flow of thought, and above 10 seconds most users abandon the action entirely. For APIs that back interactive UIs, high latency directly translates to lower conversion rates, higher bounce rates, and increased support tickets. For machine-to-machine APIs, high latency causes request queuing in the caller, which can exhaust connection pools and trigger cascading failures across your service mesh.
It is critical to distinguish between API Gateway overhead latency and integration latency. API Gateway itself typically adds only 10-30 milliseconds of overhead for REST APIs (somewhat less for HTTP APIs). If your total latency is 8 seconds and your integration latency is 7.9 seconds, the problem is entirely in your backend -- optimizing API Gateway configuration will not help. The access log fields responseLatency and integrationLatency let you decompose where time is being spent.
Detection criteria
P99 response latency exceeds 10 seconds.
P99 response latency exceeds 3 seconds.
Example finding
P99 response latency is 12.4 seconds across 8,500 requests. Median latency is 1.2s, P95 is 6.8s. The latency distribution shows a bimodal pattern suggesting Lambda cold starts are contributing to the tail.
Recommendation: Profile integration target. Check database bottlenecks, payload sizes, Lambda cold starts. Enable API Gateway caching for read-heavy endpoints.
How to fix
Decompose latency: API Gateway vs. integration
Compare responseLatency and integrationLatency in your access logs. If they are nearly identical, the bottleneck is your backend. If there is a large gap, look at API Gateway overhead: Lambda authorizer latency (can add 1-2 seconds if the authorizer itself is cold-starting), request/response VTL mapping template complexity, or request validation on large payloads. Enable authorizer caching (TTL of 300 seconds is a good starting point) to eliminate repeated cold authorizer invocations.
Eliminate Lambda cold starts
If your latency distribution is bimodal -- most requests fast but a tail of very slow ones -- cold starts are the likely cause. Enable Lambda Provisioned Concurrency to keep a minimum number of execution environments warm. For less critical endpoints, reduce the cold start penalty by minimizing deployment package size, choosing a lighter runtime (Node.js or Python cold-start faster than Java or .NET), and initializing database connections outside the handler function so they are reused across invocations. See our 504 timeout guide for detailed cold start mitigation strategies.
Enable API Gateway response caching
For GET endpoints that return data which changes infrequently, enable API Gateway caching at the stage level. A 0.5 GB cache with a 60-second TTL can reduce P99 latency from seconds to single-digit milliseconds for cached responses. Configure cache key parameters to include only the query string parameters and headers that actually affect the response. Be careful with authentication: cached responses must not leak data between users unless the cache key includes the authorization context.
Optimize backend processing
Profile your Lambda function or backend service to find where time is spent. Common culprits include: unindexed database queries that do full table scans, multiple sequential API calls to downstream services that could be parallelized, large response payloads that could be paginated or compressed, and DNS resolution for VPC-based Lambdas (use VPC endpoints for AWS services to avoid NAT Gateway latency). Add structured timing logs to your Lambda: log the duration of each external call so you can identify the slowest dependency.
Consider switching to HTTP API
If you are using REST API (v1) and do not need VTL mapping templates, request validation, or API caching, consider migrating to HTTP API (v2). HTTP APIs have lower latency overhead (typically 5-10ms vs. 15-30ms for REST APIs) and lower cost. The reduced overhead will not fix a slow backend, but it tightens the latency distribution and removes API Gateway as a contributing factor to tail latency.
Related patterns
- Integration Timeout Errors — when latency exceeds the timeout, requests fail entirely
- Slow Backend Integration — high integration latency as a component of total latency
Related guide: Troubleshooting API Gateway 504 Timeouts
Scan your logs for this — try smplogs free.
Try it free