API Gateway 504 Timeout - Fix the 29-Second Integration Limit

How to debug the 29-second integration timeout, optimize slow backends, and prevent 504 Gateway Timeout responses.

What causes API Gateway 504 errors?

A 504 Gateway Timeout from API Gateway means one thing: your backend integration did not return a response before the integration timeout expired. API Gateway gave up waiting and returned a 504 to the caller on behalf of the backend that never answered in time.

This is fundamentally different from a 502 Bad Gateway. A 502 means the backend did respond, but the response was malformed or the integration itself failed (for example, a Lambda that returned invalid JSON or threw an unhandled exception). A 504 means the backend was simply too slow - or never responded at all.

The timeout limits vary by API type:

REST API Integration timeout configurable from 50 ms to 29 seconds. The 29-second ceiling is an AWS hard limit and cannot be increased, even through a support ticket or service quota request.
HTTP API Maximum integration timeout of 30 seconds. Slightly higher than REST API, but still a hard limit.
WebSocket API Route integration timeout of 29 seconds. Same constraint as REST API.

A critical detail that catches many teams off guard: when API Gateway times out and returns a 504, the backend Lambda function may still be running. Lambda has its own timeout (up to 15 minutes), so the function can continue executing well after API Gateway has already told the client the request failed. This can lead to duplicated work, orphaned database writes, or state corruption if not handled carefully.

If your Lambda timeout is set higher than the API Gateway integration timeout (which it often is by default), the function will happily keep running, finish its work, and return a response that nobody is listening for anymore. The invocation still counts, the compute is still billed, and any side effects still happen - but the caller already received a 504 error.

Identifying 504 errors in CloudWatch logs

To confirm that you are dealing with integration timeouts (and not a different kind of 504, like a network issue), you need to check two sets of logs: the API Gateway access logs and the Lambda function logs.

In API Gateway access logs, the definitive signal is integrationLatency reported as -1. This special sentinel value means API Gateway gave up waiting before the integration responded:

{
  "requestId": "a1b2c3d4-...",
  "ip": "203.0.113.42",
  "httpMethod": "POST",
  "resourcePath": "/api/process",
  "status": "504",
  "responseLatency": "29023",
  "integrationLatency": "-1"
}

Sometimes integrationLatency will show a value very close to 29000 ms instead of -1. This typically means the integration responded just barely at the limit, and API Gateway may or may not have accepted the response depending on exact timing. Either way, it indicates the backend is dangerously close to the timeout ceiling.

{
  "requestId": "e5f6g7h8-...",
  "httpMethod": "GET",
  "resourcePath": "/api/reports/generate",
  "status": "504",
  "responseLatency": "29006",
  "integrationLatency": "28998"
}

Now check the corresponding Lambda logs. Search for the same request ID or look at invocations that started around the same timestamp. You will often find that the Lambda function completed successfully - it just finished too late:

START RequestId: a1b2c3d4-...
2026-03-06T10:15:03.221Z  INFO  Starting report generation...
2026-03-06T10:15:18.442Z  INFO  Query returned 48,201 rows
2026-03-06T10:15:31.887Z  INFO  Report generated successfully    ← finished at 31s, but API GW already returned 504 at 29s
END RequestId: a1b2c3d4-...
REPORT RequestId: a1b2c3d4-...  Duration: 28672.33 ms  Billed Duration: 28673 ms  Memory Size: 1024 MB  Max Memory Used: 412 MB

This is the hallmark pattern: the Lambda log shows a successful completion, but the duration was close to or exceeded 29 seconds. The client already received a 504 before the function finished. Always cross-reference both log sources when debugging 504s - looking at only one side gives an incomplete picture.

Understanding the 29-second limit

The 29-second integration timeout for REST APIs is one of the most frequently misunderstood limits in AWS. Teams discover it in production, assume it is a soft limit that can be raised, and then learn the hard way that it cannot.

The timeout is configurable downward - you can set it to anything from 50 milliseconds to 29 seconds. But 29 seconds is the absolute ceiling. This is documented as a hard quota in the AWS service quotas page, and filing a support ticket to increase it will be denied. AWS has maintained this limit since API Gateway launched in 2015, and there is no indication it will change.

What many developers miss is that the 29-second budget includes everything between API Gateway sending the request to the integration and receiving the response back. For a Lambda integration, that means:

Cold start time - If the function needs a new execution environment, the init duration eats into the 29-second budget. A Java function with a 5-second cold start has only 24 seconds left for actual work.
Execution time - The time your handler code takes to run, including all downstream calls (database queries, HTTP requests to other services, file processing).
Response transfer - The time to serialize and transfer the response payload back to API Gateway. For large responses (approaching the 10 MB REST API limit), this can take noticeable time.

This means a function that averages 20 seconds of execution time might work fine on warm invocations but consistently 504 on cold starts. The cold start pushes total time past the 29-second wall, and you get intermittent 504s that only appear under certain traffic patterns.

A best practice is to set your Lambda timeout shorter than the API Gateway integration timeout. For example, if your integration timeout is 29 seconds, set the Lambda timeout to 25 seconds. This way, if the function is going to exceed the time budget, Lambda terminates it and returns an error that API Gateway can relay as a proper 500 or 502 response. The client gets a meaningful error instead of a generic 504, and your Lambda logs will show a Task timed out after 25.00 seconds message that is easy to alert on.

How smplogs detects integration timeouts

When you analyze an API Gateway log file with smplogs, the WASM engine parses every access log entry, identifies status codes, and examines integration latency values. Requests with integrationLatency = -1 or status 504 are flagged as integration timeouts and produce a finding like this:

HIGH 8

Integration Timeout Errors

8 requests timed out waiting for the integration backend (integrationLatency = -1).

-> Increase the integration timeout setting (max 29s for REST API). Optimize the backend for faster response.

smplogs also produces a "High API Response Latency" finding when your P99 response latency is trending dangerously close to the timeout limit, even if no 504s have occurred yet. This serves as an early warning - if your P99 is at 25 seconds, you are one slow database query away from hitting the wall:

ELEVATED

High API Response Latency

P99 response latency is 24,812 ms, approaching the 29-second integration timeout limit.

-> Profile the slowest endpoints. Consider async processing for operations exceeding 10 seconds.

The root cause analysis correlates 504 errors with specific endpoints and time windows, so you can see whether timeouts cluster on particular routes (e.g., a report generation endpoint) or are spread across the API uniformly (suggesting a systemic backend issue like database connection exhaustion).

Step-by-step fixes

Fix 1: Profile and optimize Lambda execution time

Before adding infrastructure complexity, find out what is actually slow. Enable AWS X-Ray tracing on both API Gateway and the Lambda function. X-Ray will break down execution time into segments: initialization, handler code, and downstream calls (DynamoDB, S3, HTTP). In almost every case, the bottleneck is a downstream call - a slow SQL query, an external API that takes 15 seconds to respond, or a loop that makes hundreds of sequential DynamoDB requests instead of using BatchGetItem.

Focus on the single slowest segment. Optimizing a 20-second database query down to 2 seconds gives you 18 seconds of headroom, which is far more impactful than shaving 100ms off cold start time. Common wins include adding database indexes, replacing N+1 query patterns with batch operations, and caching frequently-read data in Lambda's /tmp storage or an external cache like ElastiCache.

Fix 2: Move long operations to an async pattern

If the operation genuinely requires more than 15-20 seconds (report generation, large file processing, machine learning inference), it should not be synchronous. Have the API Gateway endpoint accept the request, push a message to SQS or start a Step Functions execution, and immediately return a 202 Accepted response with a job ID. The client then polls a separate GET endpoint to check job status. This pattern eliminates 504s entirely for long-running work because the initial API call completes in milliseconds.

Fix 3: Reduce cold start impact

Cold starts eat into the 29-second budget. A function with a 6-second cold start (common for Java with large dependency trees) only has 23 seconds for execution on cold invocations. Two approaches:

Provisioned concurrency keeps pre-initialized containers warm. Set it to your baseline traffic level so cold starts only happen during unexpected spikes. This costs money (you pay for the provisioned containers whether they are used or not), but it guarantees consistent latency for the provisioned capacity.

Package optimization is free. Use a bundler to tree-shake unused code. For Node.js, switching from a 60 MB node_modules directory to a 3 MB esbuild bundle can cut cold start from 800ms to 150ms. For Java, use GraalVM native image or Lambda SnapStart. For Python, avoid importing heavyweight libraries at module level if they are only needed in certain code paths.

Fix 4: Use HTTP API instead of REST API

If you are using REST API and can tolerate the feature differences, switching to HTTP API gives you a 30-second timeout ceiling instead of 29 seconds. More importantly, HTTP API has lower overhead per request (roughly 5-10 ms less internal latency), which means more of the timeout budget goes to your backend code. HTTP APIs also cost less (up to 70% cheaper) and have lower cold start overhead for the API Gateway layer itself. The tradeoff is that HTTP APIs lack some REST API features like request validation, WAF integration, and usage plans.

Fix 5: Implement Lambda response streaming

Lambda response streaming (available for Node.js managed runtime and custom runtimes) allows your function to send response data incrementally as it becomes available, rather than buffering the entire response. When used with a Lambda function URL (not API Gateway directly), the first byte can reach the client in seconds even if the full response takes 30+ seconds to generate. If your 504 problem is caused by large response payloads or operations that produce output incrementally (like streaming database query results), response streaming can be a solution - though note that it requires using Lambda function URLs or CloudFront, not API Gateway REST/HTTP API directly.

Fix 6: Add API Gateway caching

For read-heavy endpoints where the same request often produces the same response, enable API Gateway caching. Cached responses are returned directly from API Gateway without invoking the backend at all, so there is zero risk of timeout. REST API supports caching natively with configurable TTL from 0 to 3600 seconds. Cache capacity ranges from 0.5 GB to 237 GB. This is most effective for endpoints like dashboards, configuration lookups, or report views where data freshness of 30-60 seconds is acceptable. It will not help for write operations or requests with unique parameters.

When to go async

The 29-second limit is not a bug to work around - it is a design constraint that tells you something important. If your API endpoint routinely takes more than 10 seconds to respond, the synchronous request-response model is the wrong fit. Users do not want to stare at a loading spinner for 25 seconds, and any network hiccup during that window will cause a failure. Asynchronous processing is more resilient, more scalable, and provides a better user experience.

Pattern 1: Accepted + polling

The simplest async pattern. The client sends a POST request, the Lambda validates the input, writes a job record to DynamoDB with status "pending", pushes the work to an SQS queue, and returns 202 Accepted with the job ID. A separate worker Lambda picks up the SQS message, does the heavy processing, and updates the DynamoDB record to "completed" with the result. The client polls a GET /status/{jobId} endpoint until it sees the completed status.

// API Lambda - returns immediately
exports.handler = async (event) => {
  const jobId = crypto.randomUUID();
  await dynamo.put({ TableName: 'Jobs', Item: { jobId, status: 'pending' } });
  await sqs.sendMessage({ QueueUrl: QUEUE_URL, MessageBody: JSON.stringify({ jobId, ...event.body }) });
  return { statusCode: 202, body: JSON.stringify({ jobId }) };
};

// Worker Lambda - processes at its own pace (up to 15 min timeout)
exports.worker = async (event) => {
  const { jobId, ...params } = JSON.parse(event.Records[0].body);
  const result = await heavyProcessing(params);  // can take minutes
  await dynamo.update({ TableName: 'Jobs', Key: { jobId },
    UpdateExpression: 'SET #s = :s, #r = :r',
    ExpressionAttributeNames: { '#s': 'status', '#r': 'result' },
    ExpressionAttributeValues: { ':s': 'completed', ':r': result }
  });
};

Pattern 2: Step Functions for complex workflows

If the long-running operation involves multiple steps with branching logic, error handling, and retries, use Step Functions instead of SQS. The API Lambda starts a Step Functions execution and returns the execution ARN. Step Functions orchestrates a chain of Lambda functions, each handling one step of the workflow. The total workflow can run for up to one year. Step Functions Express Workflows are better for high-throughput cases (up to 100,000 executions per second) with a 5-minute maximum duration.

Pattern 3: WebSocket for real-time updates

If polling feels too crude and you want the client to receive results the moment they are ready, use API Gateway WebSocket API. The client opens a WebSocket connection, sends the work request, and the backend pushes the result back when processing finishes. This eliminates both the 29-second timeout problem (the initial WebSocket message is processed quickly) and the polling overhead. The WebSocket connection itself can stay open for up to 2 hours. The trade-off is increased implementation complexity on both the client and server side - you need connection management, reconnection logic, and a way to map connection IDs to pending jobs.

Pattern 4: SQS + fire-and-forget

For operations where the client does not need the result (sending an email, generating a PDF for later download, triggering a data pipeline), simply accept the request and put it on a queue. Return 202 immediately. The worker Lambda processes messages at its own pace. If processing fails, SQS retries automatically based on your redrive policy, and failed messages land in a dead-letter queue for investigation. This is the simplest and most reliable async pattern when you do not need to return results to the caller.

Hitting the 29-second wall? Drop your API Gateway logs into smplogs to identify integration timeouts, latency outliers, and slow endpoints instantly.

Try it free

Related guides

API Gateway 502 Bad Gateway

Debug Lambda integration failures and malformed responses.

Lambda Timeout Debugging

Diagnose timeout patterns and fix downstream latency.

All Troubleshooting Guides

Lambda, API Gateway, and ECS debugging guides.

API Gateway 504 Timeout Errors

TL;DR