Slow API Endpoint
Specific endpoint with P99 latency exceeding 5 seconds.
What this means
A slow API endpoint is a specific resource-method combination in your API Gateway where the P99 response latency exceeds 5 seconds across more than 10 requests. While API-wide latency metrics might look acceptable, individual endpoints can be dramatically slower than the average without surfacing in aggregate dashboards. A data export endpoint taking 8 seconds at P99 gets averaged out by hundreds of fast health-check and list endpoints, hiding the poor experience for users who depend on that specific route.
The per-endpoint view is essential because different endpoints have different performance profiles. A GET /api/status endpoint that returns a static health check should respond in under 50 milliseconds. A POST /api/search endpoint that queries an Elasticsearch cluster might naturally take 200-500ms. A POST /api/reports/generate endpoint that aggregates data from multiple sources could legitimately need 2-3 seconds. The 5-second P99 threshold is high enough to filter out occasional slow responses -- it catches endpoints where latency has escalated beyond what is acceptable for any interactive use case, while filtering out inherently slow batch-style operations where a few seconds is expected.
Slow endpoints often indicate a problem elsewhere in the stack. An endpoint that was consistently fast and is now slow typically indicates a change in either the workload (larger payloads, more data in the database, more concurrent users) or the infrastructure (a degraded downstream dependency, a VPC networking change, or a Lambda configuration update that increased cold start frequency). Addressing slow endpoints before they become timeouts is significantly less disruptive than responding to an outage.
Detection criteria
A specific endpoint has P99 response latency exceeding 5 seconds with more than 10 requests.
Example finding
Endpoint GET /api/analytics/dashboard has P99 response latency of 7.2 seconds across 156 requests. Median latency is 1.8s, P95 is 4.1s. Integration latency accounts for 96% of total latency.
Recommendation: Profile the endpoint for slow database queries or blocking calls. The high ratio of integration latency to total latency indicates the backend Lambda or HTTP integration is the bottleneck, not API Gateway overhead.
How to fix
Trace the slow endpoint with X-Ray
Enable X-Ray tracing for the API Gateway stage and the downstream Lambda function. Filter X-Ray traces by the specific URL path to see a flame graph of where time is spent for this endpoint. X-Ray will show you initialization time (cold starts), handler execution time, and the latency of each downstream call (DynamoDB GetItem, SQS SendMessage, HTTP requests to other services). The longest subsegment in the trace is your optimization target. For endpoints that call multiple services, X-Ray often reveals that calls are sequential when they could be parallel, or that one specific dependency is consistently slow.
Optimize the database query for this endpoint
Slow endpoints are frequently backed by expensive database queries. For DynamoDB, check if the endpoint is performing a Scan operation instead of a Query -- scans read the entire table and scale linearly with table size. Add a Global Secondary Index (GSI) that supports the endpoint's access pattern. For RDS/Aurora, run EXPLAIN ANALYZE on the queries this endpoint executes and look for sequential scans, missing indexes, or inefficient joins. A single missing index can turn a 10ms query into a 5-second one as the dataset grows.
Reduce response payload size
Large response payloads increase both backend processing time (serialization) and network transfer time. If the endpoint returns a large JSON document, consider implementing pagination so each response contains a bounded number of items. Add field selection (sparse fieldsets) so clients can request only the fields they need. Enable response compression: for REST APIs, configure binary media types and ensure the client sends Accept-Encoding: gzip. For HTTP APIs, the Lambda function itself should compress the response body and set the Content-Encoding: gzip header.
Add endpoint-specific caching
If this endpoint serves data that does not change on every request, enable API Gateway caching specifically for this method. In the stage settings, override the default cache settings for this resource to enable caching with an appropriate TTL. For example, a dashboard analytics endpoint that aggregates data over the last hour only needs to refresh every 60 seconds. Configure cache key parameters to include the query strings and headers that differentiate responses (like date range or user ID) while excluding parameters that do not affect the output. A single cache hit eliminates the entire backend round-trip.
Pre-compute expensive results
For endpoints that aggregate or transform large datasets on every request, move the heavy computation out of the request path. Use an EventBridge scheduled rule or DynamoDB Streams trigger to pre-compute the result and store it in a fast-access store (DynamoDB single-item read or S3 with CloudFront). The API endpoint then becomes a simple read operation that returns the pre-computed result in milliseconds. This pattern works well for analytics dashboards, leaderboards, aggregate statistics, and report summaries -- any data that many users request but that changes on a schedule rather than per-request. See our 504 timeout guide for pre-computation architecture patterns.
Related patterns
- High API Response Latency — API-wide latency pattern that aggregates across all endpoints
- Degraded API Endpoint — slow endpoints often have elevated error rates as well
Related guide: Troubleshooting API Gateway 504 Timeouts
Check whether your API has slow endpoints — upload a CloudWatch log export to smplogs.
Try it free