API Gateway 502 Bad Gateway - Debug Lambda Integration Errors

Why API Gateway returns 502 Bad Gateway, how to find the root cause in your CloudWatch logs, and step-by-step fixes for Lambda integration failures.

What causes API Gateway 502 errors?

HTTP 502 Bad Gateway means the gateway (API Gateway) contacted the upstream server (your Lambda function, ECS service, or HTTP backend) and received an invalid response - or no response at all. Unlike a 500, which your code deliberately returns, a 502 indicates the integration itself broke down. API Gateway could not construct a valid HTTP response from whatever your backend gave it.

For Lambda proxy integrations (the most common setup), there are three primary causes:

Malformed response object - Lambda returned JSON that doesn't match the proxy integration contract. API Gateway expects a specific shape: { statusCode, body, headers }. If statusCode is missing, body is an object instead of a string, or the function returns undefined, API Gateway cannot map it to an HTTP response and returns 502.
Lambda error or crash - An unhandled exception, out-of-memory kill, or runtime crash causes Lambda to return an error payload to API Gateway instead of a proper response. The function's CloudWatch logs will show the stack trace, but API Gateway only sees a failed integration.
Integration timeout - REST APIs (v1) have a hard 29-second integration timeout. HTTP APIs (v2) allow up to 30 seconds. If Lambda is still running when that clock expires, API Gateway gives up and returns 502. This is true even if your Lambda timeout is set to 15 minutes - API Gateway won't wait that long.

There are also less common causes: an incorrect Lambda ARN in the integration configuration (API Gateway can't invoke the function at all), a missing or incorrect IAM execution role (API Gateway doesn't have lambda:InvokeFunction permission), or Lambda throttling returning a 429 that API Gateway surfaces as 502.

Identifying 502 errors in CloudWatch logs

To debug 502s, you need two sets of logs: the API Gateway access logs and the Lambda function logs. The access logs tell you which requests returned 502. The Lambda logs tell you why.

API Gateway access logs

If you have API Gateway access logging enabled (and you should), each request gets a log entry. A 502 looks like this:

(a1b2c3d4) Extended Request Id: FgH1iJkLmN HTTP Method: POST Resource Path: /checkout Status: 502 Integration Latency: - Response Latency: 29001

Two things to notice here. First, the status is 502. Second, the Integration Latency shows - (a dash), which means the integration never returned a valid response. When you see Integration Latency: - paired with Response Latency: 29001, that's almost certainly an integration timeout - API Gateway waited the full 29 seconds and gave up.

For malformed response 502s, the access log typically shows a normal integration latency (the Lambda did return something) but the status is still 502 because the response couldn't be parsed:

(e5f6g7h8) Extended Request Id: PqR2sTuVwX HTTP Method: GET Resource Path: /users/42 Status: 502 Integration Latency: 234 Response Latency: 237

Here the integration took 234ms (Lambda ran and returned), but the response was still 502. This points to a malformed response body.

Lambda function logs

The Lambda CloudWatch log group (/aws/lambda/your-function-name) reveals the actual error. Common patterns that cause 502s:

// Unhandled exception - causes 502
ERROR  Invoke Error
{
  "errorType": "TypeError",
  "errorMessage": "Cannot read property 'body' of undefined",
  "stackTrace": [
    "at handler (/var/task/index.js:12:34)",
    "at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1173:29)"
  ]
}

This tells you the function crashed because event.body was undefined (likely a GET request hitting a handler that assumed POST). Lambda returns this error object to API Gateway, which can't map it to an HTTP response, so you get a 502.

For timeout-induced 502s, the Lambda logs show:

Task timed out after 29.00 seconds

REPORT RequestId: i9j0k1l2  Duration: 29001.45 ms  Billed Duration: 29100 ms  Memory Size: 256 MB  Max Memory Used: 198 MB  Status: timeout

Note the duration hitting exactly 29 seconds. If your Lambda timeout is set to 29s (matching the API Gateway limit), you'll see this pattern. If your Lambda timeout is higher (say 60s), the Lambda itself keeps running but API Gateway has already returned 502 to the client at the 29-second mark. You might even see the Lambda eventually succeed in its own logs, but the client already got a 502.

Malformed Lambda responses

This is the single most common cause of 502 errors and the easiest to fix once you understand the contract. When you use Lambda proxy integration (the default for both REST APIs and HTTP APIs), API Gateway expects Lambda to return a JSON object with a very specific shape.

Here is what a valid response looks like:

// Correct proxy integration response
{
  "statusCode": 200,
  "headers": {
    "Content-Type": "application/json",
    "Access-Control-Allow-Origin": "*"
  },
  "body": "{\"id\": 42, \"name\": \"Jane Doe\"}"
}

The requirements are strict:

statusCode must be an integer (not a string like "200").
body must be a string. If you want to return JSON, you must JSON.stringify() it. Returning a raw object causes 502.
headers is optional but, if present, must be an object with string values.
The function must not return undefined or null.

Here are the common mistakes that trigger 502:

// BAD: body is an object, not a string -> 502
exports.handler = async (event) => {
  return {
    statusCode: 200,
    body: { id: 42, name: "Jane Doe" }
  };
};

// BAD: missing statusCode -> 502
exports.handler = async (event) => {
  return {
    body: JSON.stringify({ message: "success" })
  };
};

// BAD: returning nothing (undefined) -> 502
exports.handler = async (event) => {
  await doSomething(event);
  // forgot to return a response
};

// BAD: returning a plain string -> 502
exports.handler = async (event) => {
  return "Hello world";
};

All of these produce a 502 because API Gateway cannot transform them into a valid HTTP response. The fix for each one is to always return the full { statusCode, body } structure with body as a serialized string. A simple helper function eliminates these errors entirely (see the step-by-step fixes below).

How smplogs detects 502 errors

When you upload API Gateway access logs to smplogs, the WASM engine parses every log entry, tallies HTTP status codes, and calculates your 5xx error rate. If the rate exceeds a threshold, it produces a severity-ranked finding. For 502-heavy logs, you'll typically see a finding like this:

CRITICAL 24 High Server Error Rate (5xx)

18.2% of API requests are returning 5xx server errors (24 requests).

-> Investigate Lambda/backend integration errors. Check integration timeout settings and Lambda response format.

The severity thresholds are: CRITICAL if more than 15% of requests return 5xx, HIGH above 5%, and ELEVATED above 1%.

smplogs also produces per-endpoint breakdown analysis. If /checkout has a 40% error rate while /users is clean, the endpoint breakdown will flag it:

HIGH Degraded API Endpoint

POST /checkout has a 40.0% error rate (12 of 30 requests returning 5xx).

-> Check the Lambda function behind this endpoint for unhandled exceptions or timeout issues.

Additionally, if your logs show integration latency values of - or response latencies clustering around the 29-second mark, smplogs flags "Integration Timeout Errors" as a separate finding, helping you distinguish timeout-induced 502s from malformed-response 502s.

Step-by-step fixes

Fix 1: Validate your Lambda response format

Create a response helper function and use it everywhere. This eliminates the most common class of 502 errors:

// response.js - use this in every Lambda handler
function respond(statusCode, body, extraHeaders = {}) {
  return {
    statusCode,
    headers: {
      "Content-Type": "application/json",
      ...extraHeaders,
    },
    body: typeof body === "string" ? body : JSON.stringify(body),
  };
}

module.exports = { respond };

// Usage in handler
const { respond } = require("./response");

exports.handler = async (event) => {
  const user = await getUser(event.pathParameters.id);
  return respond(200, { id: user.id, name: user.name });
};

The helper ensures statusCode is always present and body is always a string. You can't accidentally return an object or forget the status code.

Fix 2: Wrap your handler in try/catch

Unhandled exceptions cause Lambda to return an error payload that API Gateway can't parse. Always catch errors and return a proper error response:

const { respond } = require("./response");

exports.handler = async (event) => {
  try {
    const body = JSON.parse(event.body || "{}");
    const result = await processOrder(body);
    return respond(200, result);
  } catch (err) {
    console.error("Handler error:", JSON.stringify({
      error: err.message,
      stack: err.stack,
      requestId: event.requestContext?.requestId,
    }));
    return respond(500, { error: "Internal server error" });
  }
};

With this pattern, even if processOrder throws, the client gets a proper 500 response (not a 502). The error details go to CloudWatch for debugging but never leak to the client.

Fix 3: Check integration timeout settings

API Gateway has hard timeout limits that you cannot increase:

REST API (v1): 29 seconds maximum integration timeout. This is an AWS hard limit. You can lower it but cannot raise it above 29 seconds.
HTTP API (v2): 30 seconds maximum. Slightly higher, but the same principle applies.

Check your current timeout with the AWS CLI:

# REST API - check integration timeout
aws apigateway get-integration \
  --rest-api-id abc123def \
  --resource-id xyz789 \
  --http-method POST \
  --query 'timeoutInMillis'

# Default is 29000 (29 seconds)

If your Lambda function needs more than 29 seconds to complete, you have an architecture problem. No timeout tuning will help - you need to move long-running work to an asynchronous pattern (see prevention section below).

Fix 4: Increase Lambda timeout (if below the API GW limit)

If your Lambda timeout is set lower than 29 seconds and it's timing out before the API Gateway limit, increase it. But be careful with the relationship between the two timeouts:

# Check current Lambda timeout
aws lambda get-function-configuration \
  --function-name my-checkout-function \
  --query 'Timeout'

# Increase to 29 seconds (matching API GW REST limit)
aws lambda update-function-configuration \
  --function-name my-checkout-function \
  --timeout 29

Important: Set the Lambda timeout to be slightly less than the API Gateway integration timeout (e.g., 28 seconds for a 29-second API GW timeout). This way, Lambda times out first and returns an error that your try/catch can handle gracefully with a proper error response, instead of API Gateway timing out and returning a raw 502.

Fix 5: Check Lambda IAM permissions

API Gateway needs lambda:InvokeFunction permission to call your Lambda. If the resource-based policy is missing or incorrect, every request returns 502. Verify with:

# Check if API Gateway has invoke permission
aws lambda get-policy --function-name my-checkout-function

# If missing, add it:
aws lambda add-permission \
  --function-name my-checkout-function \
  --statement-id apigateway-invoke \
  --action lambda:InvokeFunction \
  --principal apigateway.amazonaws.com \
  --source-arn "arn:aws:execute-api:us-east-1:123456789:abc123/*/POST/checkout"

This is a common issue after manually creating integrations or when deploying infrastructure-as-code that missed the permission resource. The symptom is 100% 502 rate with zero Lambda invocations in the function's metrics.

Fix 6: Enable API Gateway access logging

If you don't have access logging enabled, you're debugging blind. Enable it so you have the status codes, latencies, and request IDs needed to correlate errors across API Gateway and Lambda:

# Create a log group for API GW access logs
aws logs create-log-group --log-group-name /aws/apigateway/my-api-access-logs

# Enable access logging on the stage
aws apigateway update-stage \
  --rest-api-id abc123def \
  --stage-name prod \
  --patch-operations op=replace,path=/accessLogSetting/destinationArn,value=arn:aws:logs:us-east-1:123456789:log-group:/aws/apigateway/my-api-access-logs

# Recommended log format (JSON for easier parsing):
# { "requestId":"$context.requestId", "ip":"$context.identity.sourceIp",
#   "method":"$context.httpMethod", "path":"$context.resourcePath",
#   "status":"$context.status", "integrationLatency":"$context.integrationLatency",
#   "responseLatency":"$context.responseLatency" }

Once logging is enabled, you can export the access logs and analyze them with smplogs to quickly identify error patterns across endpoints, time periods, and request types.

Prevention and best practices

Always wrap handlers in try/catch

This is non-negotiable. Every Lambda handler behind API Gateway should have a top-level try/catch that returns a valid response on any error. An unhandled exception means a 502 - a caught exception means a clean 500 with a useful error message in the response body and full details in CloudWatch.

Use a response helper function

Create a shared utility (like the respond() function above) and enforce its use via code review or linting. This eliminates the entire class of malformed-response 502s. Some teams create an ESLint rule or TypeScript type that prevents returning anything other than the correct shape.

Set up CloudWatch alarms on 5xx count

Don't wait for users to report 502s. Create a CloudWatch alarm that fires when the 5xx count exceeds a threshold:

aws cloudwatch put-metric-alarm \
  --alarm-name "api-5xx-high" \
  --metric-name 5XXError \
  --namespace AWS/ApiGateway \
  --dimensions Name=ApiName,Value=my-api \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789:ops-alerts

This alarm fires when 10 or more 5xx errors occur in a 5-minute window. Adjust the threshold based on your traffic volume.

Use structured logging in Lambda

Log errors as JSON with the request ID, method, path, and error details. This makes it possible to correlate API Gateway access log entries (which contain the request ID) with Lambda error logs. Without structured logging, finding the Lambda error that caused a specific 502 requires manual timestamp matching across log groups.

Test with API Gateway's test invoke

Before deploying, use the API Gateway console's "Test" button or the CLI test-invoke-method command. This shows you exactly what API Gateway receives from Lambda, including the response headers and body. If the response format is wrong, the test output tells you explicitly instead of returning a cryptic 502.

Move long-running work to async patterns

If your Lambda legitimately needs more than 29 seconds (large file processing, batch operations, ML inference), don't try to squeeze it into a synchronous API call. Instead, have the API endpoint kick off the work asynchronously (invoke Lambda async, push to SQS, or start a Step Functions execution) and return a 202 Accepted immediately. The client can poll a status endpoint or receive a webhook when processing completes.

Validate request input early

Many 502-causing crashes happen because the handler assumes the request has certain fields. Add input validation at the top of your handler: check that event.body exists before parsing, verify required path parameters, and validate content types. Return a 400 Bad Request for invalid input instead of letting the function crash with a TypeError that becomes a 502.

Debugging 502 errors? Drop your API Gateway access logs into smplogs to instantly identify error patterns, degraded endpoints, and integration failures.

Try it free

Related guides

API Gateway 504 Timeout Errors

Debug integration timeouts and the 29-second limit.

Lambda Timeout Debugging

Diagnose timeout patterns and fix downstream latency.

All Troubleshooting Guides

Lambda, API Gateway, and ECS debugging guides.

API Gateway 502 Bad Gateway Errors

TL;DR