CRITICAL

Function Timeouts

Invocations hitting the configured timeout limit

What this means

A Lambda timeout occurs when the function's execution exceeds the configured timeout value (between 1 second and 15 minutes). When the deadline passes, Lambda forcibly terminates the invocation — your code does not get a chance to run cleanup logic, close connections, or return a graceful error. The invocation is marked as an error, and the REPORT line in CloudWatch will show a duration equal to or very close to the timeout value with a "Task timed out" message.

Timeouts are different from other errors because they indicate that your function is stuck rather than failing fast. The most common root causes are blocking network calls to slow or unresponsive downstream services, unexpectedly large payloads that take too long to process, database queries that perform full table scans instead of using indexes, and infinite loops or deadlocks in concurrent code. Cold starts can also contribute — if initialization takes 5 seconds and you have a 6-second timeout, even a moderately slow handler will push past the limit.

Detection criteria

smplogs triggers this finding when:

  • CRITICAL — Timeout rate exceeds 5% of total invocations
  • HIGH — At least 1 invocation timed out

Example finding

CRITICAL · 63 invocations Function Timeouts

7.8% of invocations timed out (63 of 807). Configured timeout: 30s. All timed-out invocations show "Task timed out after 30.03 seconds". Last log line before timeout: "Calling external API at https://partner-api.example.com/v2/sync".

-> Increase timeout or optimize code. Check blocking operations or downstream delays.

How to fix

Add socket-level timeouts to all outbound calls

The most common cause of Lambda timeouts is waiting indefinitely on a network call. Set explicit connect and read timeouts on every HTTP client, database driver, and SDK call. For the AWS SDK v3, configure requestTimeout on the client. For HTTP calls with fetch or axios, set a timeout lower than your Lambda timeout — typically 60-80% of the Lambda timeout to leave room for retries and cleanup. A function with a 30-second timeout should use a 20-second HTTP timeout at most.

Identify the blocking operation from the last log line

When Lambda times out, it stops execution abruptly. The last log line before the "Task timed out" message reveals what the function was doing when it got stuck. Add structured logging before every external call — database queries, HTTP requests, SQS sends, S3 operations — so you can pinpoint the exact bottleneck. If the last log line is consistently the same outbound call, that downstream service is your culprit. Check its latency metrics and health independently.

Break large operations into smaller chunks

If your function processes large datasets — parsing a big S3 file, scanning a DynamoDB table, or calling a paginated API — and occasionally exceeds the timeout on larger inputs, break the work into chunks. Use S3 Select or range reads to process files in segments. For DynamoDB scans, save the LastEvaluatedKey and use Step Functions or SQS to continue from where you left off. Lambda's 15-minute maximum is a hard ceiling, so any workload that might exceed it needs to be decomposed.

Implement circuit breaker pattern for downstream calls

If a downstream service is intermittently slow or unresponsive, a circuit breaker prevents your function from wasting its entire timeout waiting for a call that will never succeed. After a configurable number of failures, the circuit opens and immediately returns an error without making the network call. This fails fast, freeing up concurrency for requests that can succeed. Libraries like opossum (Node.js) or pybreaker (Python) provide drop-in implementations.

Increase timeout as a last resort

If the operation genuinely needs more time and cannot be decomposed, increase the timeout. But be deliberate about it — a longer timeout means timed-out invocations consume concurrency for longer, amplifying the impact during outages. If you increase from 30 seconds to 60 seconds, also increase your SQS visibility timeout to at least 6x the Lambda timeout to prevent duplicate processing. For API Gateway-backed functions, note the hard 29-second integration timeout that cannot be changed.

Upload a CloudWatch JSON export and smplogs will flag this automatically.

Try it free