Distributed Rate Limiting in Java: A Deep Dive into Bucket4j + PostgreSQL

Bucket4j for Java enables distributed rate limiting with PostgreSQL, ensuring consistency via database locks. There is, as it works.

Arkadii Osheev

Jun. 24, 25 · Analysis

Likes (2)

Comment

Save

955 Views

Important note: There are implementation details for the integration between PostgreSQL and the bucket4j library, specifically for version 8.14.0. The post's author is not responsible for future changes, but I'm 90% sure that it will be accurate for a long time.

Hey everyone!

Have you ever needed a distributed rate limiter in Java, only to realize there aren’t many out-of-the-box solutions available? It can feel overwhelming to piece together a reliable approach, especially when working in distributed systems.

That’s where Bucket4j comes in — a powerful library that simplifies rate limiting while ensuring consistency across distributed environments.

Imagine writing code like this (below) and trusting that your rate-limiting logic will work seamlessly across multiple instances of your application:

    Java
   
 

   boolean tryConsume(HttpRequest request, HttpResponse response) {
    Long key = getKey(request);
  
    Bucket bucket = bucketProvisioner.getBucket(key);
    ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(tokenNum);
    if (probe.isConsumed()) {
        response.addHeader("X-Rate-Limit-Remaining", String.valueOf(probe.getRemainingTokens()));
        return true;
    }
    long waitForRefill = toSeconds(probe.getNanosToWaitForRefill());
    response.addHeader("X-Rate-Limit-Retry-After-Seconds", String.valueOf(waitForRefill));
    return false;
}
  

If your tech stack relies heavily on relational databases like PostgreSQL, Bucket4j offers a compelling solution for distributed rate limiting without the need for in-memory caches like Redis or Hazelcast. However, if you prefer working with Ignite or Hazelcast, Bucket4j has you covered as well, thanks to its native support for the JCache API (JSR 107).

Of course, it has a Spring Boot starter to simplify your user experience.

But in this post, we’ll dive specifically into Bucket4j + PostgreSQL integration to explore how it works and why it’s a great choice for relational database-based systems.

How It Works: Bucket4j + PostgreSQL

Bucket4j leverages PostgreSQL to implement a distributed token bucket algorithm. Each "bucket" (representing rate limits for a user, API key, or resource) is stored as a serialized object in a PostgreSQL table. The library ensures atomicity and consistency across multiple application instances, making it perfect for distributed systems.

Locking Strategies: Ensuring Consistency

To prevent race conditions and maintain accurate rate limits, Bucket4j employs two locking mechanisms:

SELECT ... FOR UPDATE: This approach locks the specific row representing the bucket's state during a transaction, ensuring exclusive access.
pg_advisory_xact_lock: A more advanced method leverages PostgreSQL's advisory locks, which are application-level locks not tied to specific rows. For most cases, Bucket4j uses a hash of the bucket's identifier (e.g., user ID or API key) as the lock key. However, when working with numeric keys, the algorithm directly uses the numeric value as the lock key.

Both strategies ensure consistency, but the choice depends on your application's requirements and PostgreSQL expertise. However, it's worth noting that the SELECT ... FOR UPDATE approach typically sends 4 requests compared to 5 requests in the advisory locking case, making it slightly more efficient in terms of database interactions.

Code in Action: Configuration Made Simple

Here’s a quick example of how you can configure Bucket4j with PostgreSQL:

    Java
   
 

   ProxyManager<Long> proxyManager = Bucket4jPostgreSQL
    .selectForUpdateBasedBuilder(dataSource)
    .build();
// it will be use as a lock key in db
Long key = 1L;
BucketConfiguration bucketConfiguration = BucketConfiguration.builder()
    .addLimit(limit -> limit.capacity(10).refillGreedy(10, ofSeconds(1)))
    .build();

BucketProxy bucket = proxyManager.getProxy(key, () -> bucketConfiguration);
  

This setup connects your application to PostgreSQL, enabling distributed rate limiting with minimal effort.

SQL in Action: What Happens Under the Hood

Let’s take a closer look at the SQL queries Bucket4j generates:

1. `SELECT ... FOR UPDATE` Example

Initialization/First Request

    SQL
   
   BEGIN;
SELECT state FROM bucket WHERE id = 'your-bucket-id' FOR UPDATE;
INSERT INTO bucket(id, state) VALUES('your-bucket-id', null) ON CONFLICT(id) DO NOTHING;
COMMIT;

This sequence locks the row, retrieves the bucket's state, and inserts it if it doesn’t exist.

Subsequent Requests (Updating State)

    SQL
   
   BEGIN;
SELECT state FROM bucket WHERE id = 'your-bucket-id' FOR UPDATE;
UPDATE bucket SET state = $1, expires_at = '...' WHERE id = 'your-bucket-id';
COMMIT;

The row is locked, and the serialized state ($1) is updated with the remaining tokens and metadata.

2. `pg_advisory_xact_lock` Example

Locking and Retrieving State

Here, the advisory lock ensures transaction-level consistency without locking specific rows.

    SQL
   
 

   BEGIN;
SELECT pg_advisory_xact_lock('hash-of-your-bucket-id');
SELECT state FROM bucket_lock WHERE id = 'your-bucket-id';
-- INSERT or UPDATE logic follows...
COMMIT;
  

Key Considerations

Eviction Policy

To prevent unbounded growth, expired buckets need to be cleaned up periodically. Bucket4j calculates an expires_at value, but you’ll need a background job to delete old entries. You can find extra details in the documentation.

Performance

While PostgreSQL is highly performant, it’s not as fast as in-memory caches for extreme high-throughput scenarios. Consider your latency requirements before choosing this approach.

Serialization

The bucket state is stored as a serialized object in the database. Bucket4j handles this seamlessly, but it’s worth understanding how it works.

Transaction Isolation

Ensure your database transactions use an appropriate isolation level (e.g., READ COMMITTED) to maintain consistency.

Lock Key Isolation

When using pg_advisory_xact_lock, ensure that each lock key is unique to avoid unintended locking of unrelated transactions.

Conclusion

Bucket4j’s PostgreSQL integration is a robust solution for distributed rate limiting, especially for teams already leveraging PostgreSQL. By understanding its locking mechanisms and SQL operations, you can confidently implement rate limiting to protect your APIs and ensure a smooth user experience.

While in-memory caches like Redis may be preferable for ultra-high-performance scenarios, Bucket4j combined with PostgreSQL offers a reliable and accessible alternative for many use cases.

Have you used Bucket4j or implemented distributed rate limiting in your projects? Let me know your thoughts and experiences in the comments!

Relational database Java (programming language) rate limit PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

Trending