Distributed Rate Limiting in Java: A Deep Dive into Bucket4j + PostgreSQL
Bucket4j for Java enables distributed rate limiting with PostgreSQL, ensuring consistency via database locks. There is, as it works.
Join the DZone community and get the full member experience.
Join For FreeImportant note: There are implementation details for the integration between PostgreSQL and the bucket4j library, specifically for version 8.14.0. The post's author is not responsible for future changes, but I'm 90% sure that it will be accurate for a long time.
Hey everyone!
Have you ever needed a distributed rate limiter in Java, only to realize there aren’t many out-of-the-box solutions available? It can feel overwhelming to piece together a reliable approach, especially when working in distributed systems.
That’s where Bucket4j comes in — a powerful library that simplifies rate limiting while ensuring consistency across distributed environments.
Imagine writing code like this (below) and trusting that your rate-limiting logic will work seamlessly across multiple instances of your application:
boolean tryConsume(HttpRequest request, HttpResponse response) {
Long key = getKey(request);
Bucket bucket = bucketProvisioner.getBucket(key);
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(tokenNum);
if (probe.isConsumed()) {
response.addHeader("X-Rate-Limit-Remaining", String.valueOf(probe.getRemainingTokens()));
return true;
}
long waitForRefill = toSeconds(probe.getNanosToWaitForRefill());
response.addHeader("X-Rate-Limit-Retry-After-Seconds", String.valueOf(waitForRefill));
return false;
}
If your tech stack relies heavily on relational databases like PostgreSQL, Bucket4j offers a compelling solution for distributed rate limiting without the need for in-memory caches like Redis or Hazelcast. However, if you prefer working with Ignite or Hazelcast, Bucket4j has you covered as well, thanks to its native support for the JCache API (JSR 107).
Of course, it has a Spring Boot starter to simplify your user experience.
But in this post, we’ll dive specifically into Bucket4j + PostgreSQL integration to explore how it works and why it’s a great choice for relational database-based systems.
How It Works: Bucket4j + PostgreSQL
Bucket4j leverages PostgreSQL to implement a distributed token bucket algorithm. Each "bucket" (representing rate limits for a user, API key, or resource) is stored as a serialized object in a PostgreSQL table. The library ensures atomicity and consistency across multiple application instances, making it perfect for distributed systems.
Locking Strategies: Ensuring Consistency
To prevent race conditions and maintain accurate rate limits, Bucket4j employs two locking mechanisms:
SELECT ... FOR UPDATE
: This approach locks the specific row representing the bucket's state during a transaction, ensuring exclusive access.pg_advisory_xact_lock
: A more advanced method leverages PostgreSQL's advisory locks, which are application-level locks not tied to specific rows. For most cases, Bucket4j uses a hash of the bucket's identifier (e.g., user ID or API key) as the lock key. However, when working with numeric keys, the algorithm directly uses the numeric value as the lock key.
Both strategies ensure consistency, but the choice depends on your application's requirements and PostgreSQL expertise. However, it's worth noting that the SELECT ... FOR UPDATE
approach typically sends 4 requests compared to 5 requests in the advisory locking case, making it slightly more efficient in terms of database interactions.
Code in Action: Configuration Made Simple
Here’s a quick example of how you can configure Bucket4j with PostgreSQL:
ProxyManager<Long> proxyManager = Bucket4jPostgreSQL
.selectForUpdateBasedBuilder(dataSource)
.build();
// it will be use as a lock key in db
Long key = 1L;
BucketConfiguration bucketConfiguration = BucketConfiguration.builder()
.addLimit(limit -> limit.capacity(10).refillGreedy(10, ofSeconds(1)))
.build();
BucketProxy bucket = proxyManager.getProxy(key, () -> bucketConfiguration);
This setup connects your application to PostgreSQL, enabling distributed rate limiting with minimal effort.
SQL in Action: What Happens Under the Hood
Let’s take a closer look at the SQL queries Bucket4j generates:
1. SELECT ... FOR UPDATE
Example
Initialization/First Request
BEGIN;
SELECT state FROM bucket WHERE id = 'your-bucket-id' FOR UPDATE;
INSERT INTO bucket(id, state) VALUES('your-bucket-id', null) ON CONFLICT(id) DO NOTHING;
COMMIT;
This sequence locks the row, retrieves the bucket's state, and inserts it if it doesn’t exist.
Subsequent Requests (Updating State)
BEGIN;
SELECT state FROM bucket WHERE id = 'your-bucket-id' FOR UPDATE;
UPDATE bucket SET state = $1, expires_at = '...' WHERE id = 'your-bucket-id';
COMMIT;
The row is locked, and the serialized state ($1
) is updated with the remaining tokens and metadata.
2. pg_advisory_xact_lock
Example
Locking and Retrieving State
Here, the advisory lock ensures transaction-level consistency without locking specific rows.
BEGIN;
SELECT pg_advisory_xact_lock('hash-of-your-bucket-id');
SELECT state FROM bucket_lock WHERE id = 'your-bucket-id';
-- INSERT or UPDATE logic follows...
COMMIT;
Key Considerations
Eviction Policy
To prevent unbounded growth, expired buckets need to be cleaned up periodically. Bucket4j calculates an expires_at
value, but you’ll need a background job to delete old entries. You can find extra details in the documentation.
Performance
While PostgreSQL is highly performant, it’s not as fast as in-memory caches for extreme high-throughput scenarios. Consider your latency requirements before choosing this approach.
Serialization
The bucket state is stored as a serialized object in the database. Bucket4j handles this seamlessly, but it’s worth understanding how it works.
Transaction Isolation
Ensure your database transactions use an appropriate isolation level (e.g., READ COMMITTED
) to maintain consistency.
Lock Key Isolation
When using pg_advisory_xact_lock
, ensure that each lock key is unique to avoid unintended locking of unrelated transactions.
Conclusion
Bucket4j’s PostgreSQL integration is a robust solution for distributed rate limiting, especially for teams already leveraging PostgreSQL. By understanding its locking mechanisms and SQL operations, you can confidently implement rate limiting to protect your APIs and ensure a smooth user experience.
While in-memory caches like Redis may be preferable for ultra-high-performance scenarios, Bucket4j combined with PostgreSQL offers a reliable and accessible alternative for many use cases.
Have you used Bucket4j or implemented distributed rate limiting in your projects? Let me know your thoughts and experiences in the comments!
Opinions expressed by DZone contributors are their own.
Comments