Introduction: A High-Concurrency Problem
Picture this: You’re asked to build a high-concurrency application with dynamodb and Java. What dynamodb client will you choose? Async or Sync? No worries, we will analyze here:
Let’s think the high concurrency application is webhook-driven system with three core services—Config, Event Processor, and Webhook Dispatcher—all talking to a single, centralized DynamoDB instance. At first glance, DynamoDB seems perfect: serverless, scalable, and low-latency. But as the system grows and concurrent users pile in, cracks begin to show.
You notice throttled writes, retries piling up, and latencies creeping higher. What went wrong? And more importantly, how do you fix it?
In this blog, I’ll walk you through my journey of designing a high-concurrency DynamoDB-powered webhook system using Java, the challenges faced, and how I overcame them.
Setting the Stage: The Webhook Architecture
Our system comprises three services:
- Config Service: Manages webhook configurations, including URLs and retry policies.
- Event Processor: Processes incoming events and determines which webhooks to trigger.
- Webhook Dispatcher: Sends events to webhook endpoints and updates their statuses in DynamoDB.
All three services interact with a centralized DynamoDB table, performing reads and writes concurrently.
The Challenge:
- Provisioned Capacity: With a write capacity of just 10 WCUs, DynamoDB was often overwhelmed by bursts of write requests.
- High Concurrency: At peak, over 100 concurrent users could trigger webhook events, resulting in write contention.
- Reliability: Every write mattered, and failures needed to be handled gracefully.
The DynamoDB Client Dilemma: Sync vs. Async
The first question was: Should I use the synchronous or asynchronous DynamoDB client in Java?
Option 1: Synchronous Client
- Pros: Simple to implement, blocking I/O fits many traditional applications.
- Cons: Blocks threads until the operation completes, limiting scalability. This model struggles with high-concurrency workloads like ours.
Option 2: Asynchronous Client
- Pros: Non-blocking I/O, ideal for high-concurrency scenarios. Uses fewer threads, improving resource utilization.
- Cons: Slightly more complex to implement, requiring familiarity with CompletableFuture and async programming.
The choice was clear: Async all the way. But switching to async wasn’t the magic bullet. DynamoDB itself had its limits.
How Writes Are Handled in DynamoDB
Here’s the reality: DynamoDB doesn’t care how many threads or users you have. If your table’s provisioned write capacity is 10 WCUs, it can process a maximum of 10 write units per second.
What Happens When You Exceed Capacity?
- DynamoDB starts throttling requests.
- The client receives a
ProvisionedThroughputExceededException
, signaling the write cannot be processed immediately. - Without retries, these writes are lost.
For our system, this meant 10 writes would succeed immediately, while the other 90 would either wait (with retries) or fail outright.
Handling Throttling with the Async Client
The DynamoDbAsyncClient offered several tools to manage throttling:
Retry Logic with Exponential Backoff: The AWS SDK provides built-in retry mechanisms. I configured
a custom retry policy to handle transient throttling issues:
DynamoDbAsyncClient client = DynamoDbAsyncClient.builder()
.overrideConfiguration(ClientOverrideConfiguration.builder()
.retryPolicy(RetryPolicy.builder()
.numRetries(5)
.backoffStrategy(BackoffStrategy.defaultStrategy())
.build())
.build())
.build();
This ensured the system didn’t immediately give up on throttled writes but retried them with increasing delays.
Batch Writes for Efficiency: Instead of sending individual write requests, I grouped writes into batch requests, reducing the number of API calls:
List writeRequests = createWriteRequests(events);
BatchWriteItemRequest batchRequest = BatchWriteItemRequest.builder()
.requestItems(Collections.singletonMap("WebhookTable", writeRequests))
.build();
dynamoDbAsyncClient.batchWriteItem(batchRequest)
.whenComplete((response, exception) -> {
if (exception != null) {
System.err.println("Batch write failed: " + exception.getMessage());
} else {
System.out.println("Batch write succeeded!");
}
});
Batch writes can process up to 25 items per request, optimizing throughput.
Bonus Tip
If your application requires non-blocking, asynchronous calls to DynamoDB, you can use the DynamoDbEnhancedAsyncClient. See AWS doc for dynamoDd Enhanced Async client for more details.
Thread-Safe Design:
The DynamoDbEnhancedAsyncClient is designed to be thread-safe. You can create a single instance and reuse it for multiple operations across the application.
Concurrency Management:
The underlying DynamoDbAsyncClient uses non-blocking I/O with a thread pool (such as Netty’s event loop). It manages concurrent requests efficiently without requiring multiple client instances.
Even if 100 concurrent users write to DynamoDB, the single DynamoDbEnhancedAsyncClient will handle these requests concurrently using the non-blocking mechanism.
Singleton Pattern: Create one instance of DynamoDbEnhancedAsyncClient
during the application startup and share it across your services or classes that need to interact with DynamoDB.
public class DynamoDbClientFactory {
private static final DynamoDbEnhancedAsyncClient ENHANCED_ASYNC_CLIENT =
DynamoDbEnhancedAsyncClient.builder()
.dynamoDbClient(DynamoDbAsyncClient.create())
.build();
public static DynamoDbEnhancedAsyncClient getClient() {
return ENHANCED_ASYNC_CLIENT;
}
}
How to Know If a Write Fails:
The Async client provides a CompletableFuture
for each write operation, which lets you handle success or failure.
Example of handling write success/failure:
DynamoDbEnhancedAsyncClient client = DynamoDbEnhancedAsyncClient.builder()
.dynamoDbClient(DynamoDbAsyncClient.create())
.build();
TableSchema<MyItem> tableSchema = TableSchema.fromBean(MyItem.class);
DynamoDbAsyncTable<MyItem> table = client.table("MyTable", tableSchema);
MyItem item = new MyItem();
item.setId("123");
item.setName("Test Item");
table.putItem(item)
.whenComplete((response, exception) -> {
if (exception != null) {
System.err.println("Write failed: " + exception.getMessage());
} else {
System.out.println("Write succeeded!");
}
});
Handling Throttling or Write Failures
Exponential Backoff and Retry: Implement retry logic to handle throttling exceptions gracefully. AWS SDK v2 provides a built-in retry mechanism. You can customize the retry behavior using the RetryPolicy:
DynamoDbAsyncClient client = DynamoDbAsyncClient.builder()
.overrideConfiguration(ClientOverrideConfiguration.builder()
.retryPolicy(RetryPolicy.builder()
.numRetries(5) // Max retries
.build())
.build())
.build();
Scaling Writes: Auto Scaling and On-Demand Capacity
Retrying and batching were helpful, but they didn’t address the root issue: our write capacity was too low for peak loads.
Here’s how I scaled the system:
- Auto Scaling: I enabled Auto Scaling on the DynamoDB table, allowing it to dynamically adjust write capacity based on demand.
Target Utilization: 70%
Min/Max Capacity: 10–100 WCUs
This ensured the table scaled during traffic spikes and scaled down to save costs during idle periods. - On-Demand Capacity Mode: For truly unpredictable workloads, I considered switching to on-demand capacity, which removes the need to provision capacity altogether. DynamoDB automatically scales to meet traffic demands, though at a slightly higher cost.
Monitoring and Debugging
To ensure reliability, I set up robust monitoring and logging:
- CloudWatch Metrics:
ConsumedWriteCapacityUnits
: How much of the provisioned capacity was being used.ThrottledRequests
: Number of requests throttled due to exceeding capacity. - Alarms: I configured alarms for high throttling rates, triggering notifications when the system approached its capacity limits.
- Application Logs: Using structured logging, I captured details of failed writes, retries, and latencies to debug issues effectively.
Results: A Resilient, Scalable System
After implementing these strategies, here’s how the system performed:
- Zero Data Loss: All throttled writes were retried successfully.
- Improved Throughput: Batch writes and Auto Scaling significantly reduced request latencies.
- Scalability: The system handled up to 500 concurrent users without breaking a sweat.
Key Takeaways for Your Webhook System
- Choose the Right Client:
Use the Async client for high-concurrency scenarios.
Leverage CompletableFuture to handle write success/failure. - Plan for Throttling:
Implement retries with exponential backoff.
Use batch writes to optimize throughput. - Scale Dynamically:
Enable Auto Scaling for predictable workloads.
Switch to on-demand capacity for unpredictable traffic patterns. - Monitor Aggressively:
Use CloudWatch metrics and alarms to stay ahead of capacity issues.
Conclusion: From Challenges to Confidence
Designing a high-concurrency webhook system with DynamoDB taught me valuable lessons about scaling, resilience, and choosing the right tools. With the Async client, robust retry logic, and dynamic scaling, our system transformed from a throttled bottleneck to a reliable backbone for webhook delivery.
So, whether you’re building a webhook system or any application relying on DynamoDB, remember: Plan for concurrency, handle failures gracefully, and monitor relentlessly. Your users—and your system—will thank you.
You can find the full code in my Github Profile .
What’s your story with DynamoDB? Share your experiences and tips in the comments!