DEV Community

Cover image for How to add rate limiting to your API using TigerBeetle
Mircea Cadariu
Mircea Cadariu

Posted on • Edited on

How to add rate limiting to your API using TigerBeetle

You should always consider having explicit limits in place when building software. For online services this ensures fair use and also prevents operational headaches. You witnessed the concept in the "real world" as well - in some more busy restaurants, you have only a limited time slot in which to enjoy being seated at a table.

In this post, I'll show you in detail one solution for adding rate limiting to a Spring Boot API application. For the book-keeping required to make this work we will be using TigerBeetle, a financial transactions OLTP database that recently caught my attention and wanted to try out. As a bonus, I'll show you how to capture and visualise your app's rate limiting capability using Prometheus and Grafana, a common open-source stack for application observability. This repo contains the code I'm about to show you, if you'd like to check it out. Onwards!

TigerBeetle

TigerBeetle is a financial transactions database which appeared a couple of years ago. Their ambition is to provide a highly performant and reliable OLTP database for customers operating at massive scale. Reading about their design decisions is rather captivating and in some way reminds me of the LMAX architecture. The schema is very simple, by design. The main concept is debit / credit. It's a very flexible abstraction which can be applied to many use cases, even outside of the financial domain. After all, right, the idea of a "transaction" is pretty universal. On their website, you can find several recipes which can serve as a starting point of working with it. In the next sections, I will be applying the rate limiting recipe. It's really clear what I have to do upon reading it.

Alternatives

When doing Spring Boot application development, I expect you will most frequently encounter Redis as a backing data store for rate limiting. The existing integrations make it easy to start using it. You have the option to include Spring Cloud Gateway as a dependency and you're off to the races after you configure some things. If you already have experience with Redis, that's a totally fine route to take as well.

Getting started

We start our work, as usual with Spring Boot development, by going to start.spring.io and selecting Spring Web as dependency. We'll develop this initial empty shell into a little web application with a single API endpoint. Let's add an initial class which will determine that we do when we get web requests.

@RestController
@RequiredArgsConstructor
public class GreetingController {

    @GetMapping("/greeting")
    public String greeting() {
        return "hello";
    }
}
Enter fullscreen mode Exit fullscreen mode

Intercepting requests

Now, we want to add rate limiting to this endpoint. This means that we have to hook into the Spring request handling mechanism and inject our rate limiting logic between the point where the request is received and when it's handed over to the GreetingController. We do this by creating a class which implements the HandlerInterceptor interface and then providing it to the InterceptorRegistry:

 registry.addInterceptor(rateLimitInterceptor());
Enter fullscreen mode Exit fullscreen mode

When constructing the interceptor we have to provide the TigerBeetle client and the observation registry as collaborating services for the rate limiting. At this point, you might want to get an introduction to the observability registry and all the other related topics, I recommend this post from the Spring blog for getting familiarised about how the integration between Spring Boot and the observability stack works.

@Bean
@RequestScope
HandlerInterceptor rateLimitInterceptor() {
   return new RateLimitInterceptor(client, observationRegistry);
}
Enter fullscreen mode Exit fullscreen mode

The logic for performing the rate limiting will be in the implementation of the preHandle method which is part of the HandlerInterceptor interface.

Note that this means all your endpoints will be subject to rate limiting. If you want to, you can define a list of exceptions, or create custom annotations which you will apply to specific endpoints for more fine-grain control. But for this post, we're keeping it simple.

Every request means a debit

Let us now define two accounts:

  • the operator
  • the user

The operator is responsible to initialise the user accounts with a finite amount from our application will deduct a finite amount when handling every request from that particular user. In addition, the user account has the following important restriction: the debits must not exceed the credits. For every request, we will make a transfer from the user to the operator, but if the limit is reached, we will short-circuit the request from proceeding as usual and return with 429 ("Too Many Requests") response code right away.

Worth mentioning, is that the general idea is we can represent any kind of resource we are interested in rate limiting, such as an IP, customer, etc.

Here is how the creation of the user account looks. The USER_ID is just a generated random integer, however you can imagine that in a real system it's retrieved from something like the an authentication system. In the reference system architecture, this would be what is depicted as the OLGP database (e.g. Postgres).

   AccountBatch accountBatch = new AccountBatch(1);
   accountBatch.add();
   accountBatch.setId(USER_ID);
   accountBatch.setLedger(1);
   accountBatch.setCode(1);
   accountBatch.setFlags(DEBITS_MUST_NOT_EXCEED_CREDITS);

   client.createAccounts(accountBatch);
Enter fullscreen mode Exit fullscreen mode

Notice that the interface is modelled around batching. This comes back to the performance as a first class principle in TigerBeetle. With batching, we amortise the cost of overhead. Given the use-case we're tackling here, our batch is limited to one account, but normally you would have more.

Onto the method we use to perform a transfer. It is invoked on every web request.

private CreateTransferResultBatch makeTransfer(long amount, long debitAcct, long creditAcct, int timeout, int flag) {
    TransferBatch transfer = new TransferBatch(1);

    transfer.add();
    transfer.setId(new Random().nextInt());
    transfer.setDebitAccountId(debitAcct);
    transfer.setCreditAccountId(creditAcct);
    transfer.setLedger(1);
    transfer.setCode(1);
    transfer.setAmount(amount);
    transfer.setFlags(flag);
    transfer.setTimeout(timeout);

    return client.createTransfers(transfer);
}
Enter fullscreen mode Exit fullscreen mode

The flag and timeout parameters are needed because for every user requests, we will create a "pending" transfer (this is a type of flag). This means it expires after timeout seconds. This makes it so that the allowance will replenish after a configurable period, which we want to happen.

On the first request by a user, we have to initialise the account, by doing a transfer from the operator to the user:

 makeTransfer(
   USER_CREDIT_INITIAL_AMOUNT,
   OPERATOR_ID,
   USER_ID,
   0,
   0
 );
Enter fullscreen mode Exit fullscreen mode

For every intercepted request, we perform a deduction from the user's account:

 CreateTransferResultBatch transferErrors = 
   makeTransfer(
     PER_REQUEST_DEDUCTION,
     USER_ID,
     OPERATOR_ID,
     TIMEOUT_IN_SECONDS,
     PENDING
   );
Enter fullscreen mode Exit fullscreen mode

If the above operation returns an error of type ExceedsCredits (one of the values of the CreateTransferResult enum), this means that we will not let this request to proceed. We will send an observation towards our observability stack, set an attribute on the current tracing span, as well as set the response code to 429.

  Observation observation = start("ratelimit", observationRegistry);
  observation.event(of("limited"));
  observation.highCardinalityKeyValue(of("user", valueOf(USER_ID)));
  observation.stop();

  Span.current().setAttribute("user", valueOf(USER_ID));

  response.setStatus(TOO_MANY_REQUESTS.value());
  return false;
Enter fullscreen mode Exit fullscreen mode

Testing

So far so good. Let's write a Spring Boot test in which we assert that what I've described above actually happens as we expect:

package com.example.tigerbeetle_ratelimiter;

...

import static io.micrometer.observation.tck.TestObservationRegistryAssert.assertThat;
import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class RatelimiterApplicationTests {

    private static final String ENDPOINT = "/greeting";

    @Container
    public static DockerComposeContainer<?> environment =
            new DockerComposeContainer<>(new File("docker-compose.yml"));

    @Autowired
    private TestRestTemplate restTemplate;

    @Autowired
    private TestObservationRegistry observationRegistry;

    @Test
    void contextLoads() {
    }

    @Test
    void shouldRejectRequestsBeyondRateLimit() {
        for (int i = 0; i < USER_CREDIT_INITIAL_AMOUNT / PER_REQUEST_DEDUCTION; i++) {
            restTemplate.getForEntity(ENDPOINT, String.class);
        }

        // The next request should be rate limited
        ResponseEntity<String> response = restTemplate.getForEntity(ENDPOINT, String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.TOO_MANY_REQUESTS);

        assertThat(observationRegistry)
                .hasObservationWithNameEqualTo("ratelimit")
                .that()
                .hasBeenStarted()
                .hasBeenStopped();
    }

    @TestConfiguration
    static class ObservationTestConfiguration {

        @Bean
        TestObservationRegistry observationRegistry() {
            return TestObservationRegistry.create();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Time to show what happens when we run it.

Showtime

In production environments, TigerBeetle is normally deployed as a cluster of multiple replicas. However, given that we're just experimenting with it locally, we'll start a single instance, fully accepting that it is not set up in a highly available fashion and we will not do this in production.

Let's format the data file first:

docker run --security-opt seccomp=unconfined \
     -v $(pwd)/data:/data ghcr.io/tigerbeetle/tigerbeetle \
    format --cluster=0 --replica=0 --replica-count=1 /data/0_0.tigerbeetle
Enter fullscreen mode Exit fullscreen mode

You observed that as a result, a folder was created called data having a file in it called 0_0.tigerbeetle. This single file is where the TigerBeetle replica will store its our rate limiting book-keeping data.

We're now ready to start our docker-compose setup where everything is wired up and ready to go.

We will first install the app:

./mvnw install
Enter fullscreen mode Exit fullscreen mode

After this, we are ready to start our full environment:

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

If all services started correctly, we're in business!

Load testing

As a next step, let's set up some requests that will hit the endpoint. k6s is a tool for doing load testing which is very handy for these situations. It's easy to work with it - you write javascript code to describe the load you want to generate and it will proceed to execute it against your target when you run it.

This is the contents of the k6s script. We'll issue 3000 requests within a span of 30 seconds.

import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  vus: 100,
  duration: '30s',
};

export default function() {
  let res = http.get('http://host.docker.internal:8080/greeting');
  check(res, { "status is 200": (res) => res.status === 200 });
  sleep(1);
}
Enter fullscreen mode Exit fullscreen mode

We'll now run the script:

docker run --rm -i grafana/k6 run - <script.js
Enter fullscreen mode Exit fullscreen mode

After 30 seconds, we get the following:

 █ TOTAL RESULTS 

    checks_total.......................: 3000   97.661337/s
    checks_succeeded...................: 16.80% 504 out of 3000
    checks_failed......................: 83.20% 2496 out of 3000

    ✗ status is 200
      ↳  16% — ✓ 504 / ✗ 2496

Enter fullscreen mode Exit fullscreen mode

As we can see, there are more requests which were rate limited than successful ones. We were not kidding. We applied quite a high deduction per request, but we might want to get our foot off the brakes in the context of a real app!

Visualising rate limiting

Moving over to Grafana. I've prepared a pre-configured dashboard for your convenience which we'll now open up and have a look. Let's go to localhost:3000 and fill in admin/admin as credentials, and then click Skip when asked about changing the password. Then, on the left side of the screen, click on Dashboards.

dashboard

You'll then see our preconfigured dashboard called Rate limiting. Click on it and you will see the following:

dash

Alright, time to have a look at the request traces. These show you the "path" taken by the request through our code. This is where you can find them in the menu.

traces

In the lower part of the next screen you will see some outstanding green dots. Those are so-called exemplars. Metrics give you an aggregated perspective of what you're tracking, but with exemplars you can drill down to understand particular single instances. Here's how one looks like. I have highlighted the span attribute representing the user ID which we set in the Java code you've seen earlier.

exemplars

The End

Like I've mentioned before, having limits in place for everything is a good thing. Same goes for this post! 😀
So - that's all I have for you today, hope you enjoyed it and thanks for reading.

Time to clean up by tearing down our setup.

docker-compose down -v
Enter fullscreen mode Exit fullscreen mode

Thanks - until next time!

Cover Photo by Spencer DeMera on Unsplash

Top comments (2)

Collapse
 
anwaar profile image
Anwar

Thank you for your effort and attention to detail.

Collapse
 
mcadariu profile image
Mircea Cadariu

@anwaar thanks a lot for your message