Slow Backend Integration
P99 integration latency contributing significantly to total response time.
What this means
Slow backend integration means the time your backend spends processing requests -- measured as integrationLatency in API Gateway access logs -- is elevated at the tail of the distribution. Integration latency captures the duration from when API Gateway forwards the request to the integration target to when it receives the first byte of the response. It does not include API Gateway's own processing time (auth, validation, mapping), so it isolates the backend's contribution to total latency.
A P99 integration latency above 2.5 seconds is a warning signal. While the median request may complete in 200-400 milliseconds, the slowest 1% are taking over 6x longer. These slow requests typically fall into two categories: Lambda cold starts (where a new execution environment is initialized, adding 500ms-5s depending on runtime and package size) and genuine processing bottlenecks (slow database queries, downstream service calls, or CPU-intensive computation). Distinguishing between these two categories is essential because the fixes are completely different.
Unlike total response latency, integration latency is entirely within your control. API Gateway overhead is fixed infrastructure cost that AWS manages, but integration latency reflects your code, your database, your architecture decisions. High integration latency that is not yet causing timeouts is an early warning: it means your backend is approaching the timeout boundary, and any additional load, any slight degradation in a downstream dependency, could tip those slow requests into actual timeouts. Addressing it proactively prevents escalation to integration timeout errors.
Detection criteria
P99 integration latency exceeds 2.5 seconds.
Example finding
P99 integration latency is 3.8 seconds across 12,400 requests. Median integration latency is 320ms, P95 is 1.9s. Integration latency accounts for 92% of total response latency, indicating the backend is the primary bottleneck.
Recommendation: Profile integration target. Look for slow DB queries, downstream calls, Lambda cold starts. Consider connection pooling and caching for database-heavy workloads.
How to fix
Profile your Lambda function with X-Ray
Enable AWS X-Ray tracing on both your API Gateway stage and Lambda function. X-Ray breaks down each invocation into subsegments: initialization (cold start), handler execution, and external calls (DynamoDB, S3, HTTP). This immediately reveals whether slow invocations are cold starts (large init segment) or slow business logic (large handler segment with slow subsegments). Enable active tracing with aws lambda update-function-configuration --tracing-config Mode=Active and add the X-Ray SDK to instrument downstream calls.
Optimize database access patterns
Database queries are the most common source of high integration latency. Check for missing indexes on frequently queried columns, N+1 query patterns where a loop makes one query per item instead of a batch query, and full table scans on large tables. For DynamoDB, check that your partition key provides even distribution and that you are not performing scans when queries would suffice. For RDS, use connection pooling with RDS Proxy to eliminate connection setup overhead (which can add 50-200ms per invocation for VPC-based Lambdas that create new connections).
Reduce Lambda cold start impact
If X-Ray shows that slow requests correlate with cold starts, focus on reducing initialization time. Move heavyweight imports and SDK client initialization outside the handler function so they happen once per execution environment. Minimize your deployment package: remove unused dependencies, use Lambda layers for shared libraries, and consider tree-shaking or bundling with esbuild. For Java and .NET runtimes, use SnapStart or tiered compilation to cut cold start time by 50-90%. For critical low-latency endpoints, configure Provisioned Concurrency to keep environments pre-warmed.
Parallelize downstream calls
If your Lambda makes multiple independent external calls (fetching user data from DynamoDB, checking permissions from another service, loading configuration from Parameter Store), run them in parallel instead of sequentially. In Node.js, use Promise.all(). In Python, use asyncio.gather() or concurrent.futures.ThreadPoolExecutor. Three sequential calls at 400ms each take 1.2 seconds; parallelized, they complete in 400ms. This is often the single highest-impact optimization available.
Add application-level caching
For data that does not change on every request (user profiles, configuration, product catalogs), cache it in-memory within the Lambda execution environment or in an external cache like ElastiCache. A global variable in Lambda persists across warm invocations, making it a zero-cost in-process cache for small datasets. For larger or shared datasets, use a Redis cluster. Even a 30-second cache TTL on frequently-accessed data can reduce P99 integration latency dramatically by eliminating database roundtrips on the hot path.
Related patterns
- High API Response Latency — slow integration drives up total response latency
- Integration Timeout Errors — unchecked slow integration eventually causes timeouts
Find out if your backend integration is the bottleneck — try smplogs with a CloudWatch export.
Try it free