Lambda Cold Start Troubleshooting
How to identify cold start patterns in your CloudWatch logs, measure their impact, and reduce init latency.
TL;DR
Cold starts happen when Lambda creates a new execution environment. Look for INIT_START and Init Duration in your REPORT lines. The fastest fix is moving SDK client initialization outside the handler (runs once per container instead of once per invocation). Provisioned concurrency eliminates cold starts entirely but costs money - optimize your package size first.
What is a Lambda cold start?
A cold start occurs when AWS Lambda needs to create a new execution environment to handle a request. This happens when there are no warm containers available - either because the function hasn't been invoked recently, traffic has spiked beyond the current pool of containers, or the function was just deployed.
During a cold start, Lambda performs three steps before your code runs:
- Downloads your deployment package (or pulls the container image)
- Starts the runtime (Node.js, Python, Java, Go, etc.)
- Runs your initialization code (module-level code outside the handler)
The combined time for these steps shows up as Init Duration in CloudWatch logs. This is the latency your users experience on top of the normal function execution time.
Identifying cold starts in CloudWatch logs
Lambda logs cold starts with a specific line format. Look for these patterns:
INIT_START Runtime Version: nodejs20.x Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:...
And in the REPORT line, cold starts include an extra Init Duration field:
REPORT RequestId: abc-123 Duration: 45.21 ms Billed Duration: 946 ms Memory Size: 512 MB Max Memory Used: 128 MB Init Duration: 891.33 ms
The Init Duration field only appears on cold start invocations. If the REPORT line doesn't have it, the invocation reused a warm container.
Measuring cold start impact
To understand how cold starts affect your users, you need three numbers from your logs:
- Cold start rate What percentage of invocations have
Init Duration? Above 10-15% means cold starts are frequent enough to affect user experience. - Avg init duration How long is the initialization? Under 200ms is generally acceptable. Over 500ms is noticeable. Over 1 second needs attention.
- P99 impact Cold starts disproportionately affect P99 duration. Compare P99 vs P50 - if the gap is large, cold starts are likely the cause.
Cold start times by runtime
Runtime choice is the single biggest factor in cold start duration. These are typical observed ranges across standard Lambda configurations (128–512 MB memory, non-VPC, package under 10 MB):
| Runtime | Typical cold start | With VPC | Notes |
|---|---|---|---|
| Node.js 20/22 | 100–300 ms | 200–400 ms | Fastest JS option; use esbuild to minimize package size |
| Python 3.12/3.13 | 100–250 ms | 200–350 ms | Heavy imports (pandas, numpy) add 500ms–2s to init |
| Go | 50–150 ms | 150–250 ms | Native binary, minimal startup overhead |
| Rust (custom runtime) | 30–100 ms | 130–200 ms | Fastest overall; requires lambda_runtime crate |
| Java 21 (JVM) | 1–4 s | 1.2–4.5 s | Use SnapStart to bring this under 200 ms |
| Java 21 + SnapStart | 100–200 ms | 200–350 ms | Restores from snapshot; big win for JVM workloads |
| .NET 8 (JIT) | 800 ms–3 s | 1–3.5 s | Use NativeAOT for sub-200 ms |
| .NET 8 NativeAOT | 50–200 ms | 150–300 ms | Compiled to native; no JIT warmup cost |
Figures are typical observed ranges, not AWS-guaranteed SLAs. Actual durations vary with memory allocation, package size, and initialization complexity. Higher memory allocations get proportionally more CPU, which can reduce init time for CPU-bound startup work.
How smplogs detects cold starts
When you analyze a Lambda log file with smplogs, the WASM engine automatically parses every REPORT line, extracts Init Duration values, and produces a finding like this:
18.3% cold start rate with avg 890ms init. P99 heavily impacted.
smplogs also correlates cold starts with P99 duration spikes in the root cause analysis, so you can see exactly how much cold starts contribute to tail latency.
Common causes of high cold start rates
Large deployment packages
The bigger your deployment package, the longer Lambda takes to download and extract it. Node.js functions with large node_modules or Python functions with heavy dependencies (pandas, numpy) are common offenders.
SDK clients created inside the handler
Creating AWS SDK clients (DynamoDB, S3, SQS) inside the handler function means they get recreated on every invocation. Move them to module scope so they're initialized once per container.
Low traffic
Lambda recycles idle containers after a period of inactivity (typically 5-15 minutes, though this varies). Functions invoked less than a few times per minute will see higher cold start rates.
Traffic spikes
When traffic suddenly increases, Lambda needs to spin up new containers to handle the burst. Each new container experiences a cold start, so they tend to cluster around spikes.
VPC-attached functions
Functions in a VPC used to have significantly longer cold starts due to ENI attachment. AWS improved this with Hyperplane in 2019, but VPC functions still add some overhead compared to non-VPC ones.
How to reduce cold start latency
Move initialization outside the handler
The most impactful free optimization. Any code outside the handler function runs once per container, not once per invocation.
// Bad - creates a new client every invocation exports.handler = async (event) => { const dynamo = new DynamoDBClient({}); // ... }; // Good - client created once per container const dynamo = new DynamoDBClient({}); exports.handler = async (event) => { // reuses the client from above };
Reduce deployment package size
Use a bundler (esbuild, webpack) to tree-shake unused code. For Node.js, this can cut packages from 50MB+ to under 5MB. Import only what you need from the AWS SDK v3 (@aws-sdk/client-dynamodb instead of the full SDK).
Provisioned concurrency
Keeps a pool of pre-initialized containers warm. Eliminates cold starts entirely for the provisioned amount but adds cost. Best for latency-sensitive functions where cold starts exceed 1 second.
Pick a faster runtime
Node.js and Python cold-start in 100-300ms. Java and .NET are 1-5 seconds unless you use GraalVM native image or .NET NativeAOT. Go and Rust compile to native code and cold-start almost instantly.
SnapStart (Java only)
Lambda SnapStart snapshots the initialized execution environment and restores it on cold start. Brings Java cold starts from seconds down to under 200ms.
Want to skip the manual inspection? Drop your CloudWatch JSON into smplogs and get cold start rate, init duration breakdown, and P99 impact in seconds.
Try it free