HIGH

DynamoDB Throttling Errors

Invocations failing due to DynamoDB provisioned throughput limits

What this means

DynamoDB throttling occurs when your Lambda function sends read or write requests that exceed the provisioned capacity units (RCU/WCU) allocated to a table or global secondary index. When DynamoDB cannot serve a request within the provisioned throughput, it rejects the call with a ProvisionedThroughputExceededException. The AWS SDK will retry these automatically (typically 10 times with exponential backoff), but if the throttling is sustained, retries exhaust themselves and the error propagates to your Lambda handler.

Throttling is almost always a capacity planning problem rather than a code bug. The most common trigger is a traffic spike — a marketing campaign, a batch job that runs every hour, or a viral event — that pushes request rates beyond what you provisioned. Hot partitions are another frequent cause: if your partition key design concentrates traffic on a small number of keys (for example, using a date string as the partition key), a single partition can receive more than its 3,000 RCU / 1,000 WCU share even if the table-level capacity is generous.

For synchronous API-backed Lambdas, throttling means elevated latency (due to SDK retries) or outright 500 errors. For asynchronous event processors, throttled writes can cause messages to pile up in SQS or Kinesis, creating a backlog that takes time to drain once capacity is restored. In the worst case, a Lambda triggered by a DynamoDB Stream on the same throttled table can fall behind, creating a feedback loop where the stream iterator expires and data is lost.

Detection criteria

smplogs triggers this finding when:

  • HIGH — More than 10 invocations contain DynamoDB throttling errors
  • ELEVATED — At least 1 invocation contains a DynamoDB throttling error

Example finding

HIGH · 34 invocations DynamoDB Throttling Errors

34 invocations encountered ProvisionedThroughputExceededException on table "orders-prod". Throttling concentrated between 14:00-14:15 UTC during peak traffic window.

-> Enable DynamoDB on-demand capacity or increase provisioned RCU/WCU. Implement exponential backoff.

How to fix

Switch to on-demand capacity mode

If your traffic is unpredictable or spiky, switch the table to on-demand (pay-per-request) capacity mode. On-demand automatically scales to handle any request rate and eliminates throttling entirely for most workloads. The trade-off is a roughly 6.5x higher per-request cost compared to fully utilized provisioned capacity — but for tables with variable traffic patterns, on-demand is almost always cheaper than over-provisioning. You can switch between modes once every 24 hours.

Enable auto-scaling on provisioned tables

If you prefer provisioned mode for cost control, enable Application Auto Scaling on both the table and all its global secondary indexes. Set target utilization to 70% to leave headroom for bursts. Keep in mind that auto-scaling reacts to CloudWatch metrics with a 1-2 minute lag, so it will not prevent throttling during sudden spikes — it only helps with gradual traffic ramps. For predictable spikes (batch jobs, cron), use scheduled scaling to pre-warm capacity before the load arrives.

Fix hot partition keys

Check your table's partition key distribution. If a small number of partition keys receive the majority of traffic, DynamoDB's adaptive capacity may not compensate quickly enough. Common anti-patterns include using a date string, a boolean status flag, or a low-cardinality enum as the partition key. Consider adding a random suffix (write sharding) or restructuring your key schema. You can use CloudWatch Contributor Insights for DynamoDB to identify the hottest keys without any code changes.

Add application-level retry with backoff

The AWS SDK retries throttled requests automatically, but its default retry budget may not be enough for sustained throttling. Configure the SDK's retry strategy explicitly: in AWS SDK v3 for JavaScript, set maxAttempts: 5 and use the STANDARD retry mode which includes jitter. For batch operations like BatchWriteItem, always process the UnprocessedItems response and retry them with a delay.

Buffer writes through SQS

For write-heavy workloads, decouple the write path by sending items to an SQS queue first and processing them with a separate Lambda function at a controlled concurrency. Set the Lambda's reserved concurrency to match your DynamoDB write capacity. This turns a bursty write pattern into a smooth, controlled throughput that stays within provisioned limits. SQS also gives you automatic retry and dead-letter queue support for free.

Catch DynamoDB throttling before it hits production. Upload a CloudWatch export to smplogs.

Try it free