Elevated Error Rate
Percentage of invocations failing above the 1% threshold
What this means
An elevated error rate means a significant percentage of your Lambda invocations are completing with an error status rather than returning a successful response. Every Lambda invocation that throws an unhandled exception, returns an error response, or crashes due to an out-of-memory condition counts toward your error rate. When this rate climbs above 1%, it signals a systemic issue rather than the occasional transient blip.
Lambda errors fall into two broad categories: function errors and runtime errors. Function errors occur when your code throws an exception or returns a structured error response. Runtime errors happen at a lower level — the Lambda execution environment itself fails, typically due to memory exhaustion, a missing handler export, or a segmentation fault in a native dependency. Both types appear in CloudWatch as entries with an ERROR status in the REPORT line, but they require different remediation strategies.
Detection criteria
smplogs triggers this finding when:
- ●CRITICAL — Error rate exceeds 20% of total invocations
- ●HIGH — Error rate exceeds 10% of total invocations
- ●ELEVATED — Error rate exceeds 5% of total invocations
Example finding
23.4% of invocations returned errors (247 of 1,055). Top error: "Cannot read properties of undefined (reading 'Items')" appeared in 189 invocations.
How to fix
Identify the dominant error pattern
Before fixing anything, find which error message accounts for the majority of failures. A 20% error rate caused by a single TypeError is a very different problem from a 20% rate spread across dozens of distinct exceptions. Focus on the top contributor first — fixing one root cause often drops the overall rate dramatically.
Add structured error handling with try/catch
Wrap your handler logic in try/catch blocks and return meaningful error responses instead of letting exceptions propagate to the runtime. For Node.js, catch both synchronous and Promise rejection errors. For Python, use broad except clauses at the handler level that log the full traceback before returning a structured error. This prevents Lambda from marking the invocation as a runtime crash.
Implement retry logic for transient failures
If the errors are caused by downstream services (DynamoDB throttling, HTTP 503s from an API, or SDK timeout errors), add exponential backoff with jitter directly in your function code. The AWS SDK v3 has built-in retry configuration — set maxAttempts to 3-5 with standard retry mode. For non-AWS services, use a library like p-retry or Python's tenacity.
Validate input before processing
Many Lambda error spikes are caused by unexpected event payloads — a missing field from API Gateway, a malformed SQS message body, or a null value in a DynamoDB stream record. Add input validation at the top of your handler to catch these early and return a 400-level response rather than crashing midway through processing. Libraries like Zod, Joi, or Pydantic make this straightforward.
Related patterns
Drop your CloudWatch export into smplogs to check for this.
Try it free