CRITICAL

High Server Error Rate (5xx)

API requests returning 5xx server errors from Lambda or backend integration failures.

What this means

A high 5xx error rate in API Gateway means your backend is failing to process requests successfully. Unlike 4xx errors which indicate client-side problems, 5xx errors are entirely server-side: your API accepted the request, forwarded it to an integration target (Lambda, HTTP endpoint, or AWS service), and something went wrong during processing. The client receives a generic "Internal Server Error" or "Bad Gateway" response with no actionable detail.

In API Gateway, the most common 5xx status codes are 502 (Bad Gateway) and 504 (Gateway Timeout). A 502 typically means the Lambda function threw an unhandled exception, returned a malformed response (missing statusCode or body fields), or the integration response mapping failed. A 504 means the backend did not respond within the configured integration timeout (default 29 seconds for REST APIs). Less commonly, 500 errors indicate internal API Gateway failures or misconfigured request/response transformations in VTL mapping templates.

When the 5xx rate crosses low single digits, it is a reliability incident. Every 5xx response means a user action failed -- a form submission lost, a page that won't load, a mobile app screen stuck in a spinner. At scale, even a 2% error rate during peak traffic can translate to thousands of failed requests per minute. If the errors cluster on specific endpoints or time windows, they often point to a single root cause like a database connection pool exhaustion or a downstream service outage that is cascading through your API layer.

Detection criteria

CRITICAL

More than 15% of total requests return 5xx status codes.

HIGH

More than 5% of total requests return 5xx status codes.

ELEVATED

More than 1% of total requests return 5xx status codes.

Example finding

CRITICAL High Server Error Rate (5xx)

18.4% of API Gateway requests returned 5xx errors (1,842 of 10,011 requests). The majority are HTTP 502 responses from Lambda integration failures.

Recommendation: Investigate Lambda/backend integration errors. Check integration timeout settings, IAM permissions for API Gateway to invoke Lambda, and Lambda function error logs.

How to fix

Check Lambda function logs for unhandled exceptions

The most common cause of 502 errors is a Lambda function that throws an unhandled exception or returns a malformed response. Open the Lambda function's CloudWatch log group and search for "ERROR", "Task timed out", or stack traces. Every Lambda invocation that ends in an error produces a 502 at the API Gateway level. Fix the root exception, add try/catch blocks around external calls, and ensure your handler always returns { statusCode: 200, body: "..." } even in error paths.

Verify IAM permissions for API Gateway to invoke Lambda

If you recently changed the Lambda function name, alias, or account, the API Gateway resource policy may no longer have lambda:InvokeFunction permission. This silently produces 500 errors. In the API Gateway console, check the integration request and re-add the Lambda function to regenerate the permission. You can also verify with aws lambda get-policy --function-name <name>.

Increase integration timeout for long-running operations

If you see 504 errors alongside high integrationLatency values, the backend is exceeding the integration timeout. For REST APIs the maximum is 29 seconds; for HTTP APIs it is 30 seconds. If your Lambda needs more time, consider switching to an asynchronous pattern: have the API return a 202 immediately with a job ID, then process the work via SQS or Step Functions and let the client poll for results. See our API Gateway 504 timeout guide for a walkthrough.

Validate Lambda response format

API Gateway expects Lambda proxy integrations to return a specific JSON shape: { statusCode, headers, body }. If body is an object instead of a string, or if statusCode is missing, API Gateway returns a 502. Add response format validation in your Lambda handler and log the exact response object before returning it. For non-proxy integrations, verify your VTL mapping templates are correctly transforming the response.

Monitor downstream dependencies

If errors spike at specific times, the root cause is often a downstream service: a database hitting connection limits, a third-party API returning errors, or an S3 bucket with throttled reads. Add structured logging to your Lambda that captures latency and status for every external call. Use CloudWatch Contributor Insights or X-Ray traces to identify which downstream dependency is failing and correlate error spikes with its availability.

Related guide: Troubleshooting API Gateway 502 Errors

Check your API Gateway 5xx error rate — drop a CloudWatch log export into smplogs.

Try it free