Basic API Rate-Limiting

It is likely that you are developing some form of web or RESTful API, and in case it is public-facing (or even when it’s internal), you normally want to rate-limit it somehow. That is, to limit the number of requests performed over a period of time, in order to save resources and protect it from abuse.

This can probably be achieved on the web-server/load balancer level with some clever configurations, but usually, you want the rate limiter to be client-specific (i.e. each client of your API should have a separate rate limit), and the way the client is identified varies. It’s probably still possible to do it on the load balancer, but I think it makes sense to have it on the application level.

I’ll use spring-mvc for the example, but any web framework has a good way to plug an interceptor.

So, here’s an example of a spring-mvc interceptor:

@Component
public class RateLimitingInterceptor extends HandlerInterceptorAdapter {
 
    private static final Logger logger = LoggerFactory.getLogger(RateLimitingInterceptor.class);
     
    @Value("${rate.limit.enabled}")
    private boolean enabled;
     
    @Value("${rate.limit.hourly.limit}")
    private int hourlyLimit;
 
    private Map<String, Optional<SimpleRateLimiter>> limiters = new ConcurrentHashMap<>();
     
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
            throws Exception {
        if (!enabled) {
            return true;
        }
        String clientId = request.getHeader("Client-Id");
        // let non-API requests pass
        if (clientId == null) {
            return true;
        }
        SimpleRateLimiter rateLimiter = getRateLimiter(clientId);
        boolean allowRequest = limiter.tryAcquire();
     
        if (!allowRequest) {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        }
        response.addHeader("X-RateLimit-Limit", String.valueOf(hourlyLimit));
        return allowRequest;
    }
     
    private SimpleRateLimiter getRateLimiter(String clientId) {
        if (limiters.containsKey(clientId)) {
            return limiters.get(clientId);
        } else {
            synchronized(clientId.intern()) {
                // double-checked locking to avoid multiple-reinitializations
                if (limiters.containsKey(clientId)) {
                    return limiters.get(clientId);
                }
                 
                SimpleRateLimiter rateLimiter = createRateLimiter(clientId);
                 
                limiters.put(clientId, rateLimiter);
                return rateLimiter;
            }
        }
    }
     
    @PreDestroy
    public void destroy() {
        // loop and finalize all limiters
    }
}

This initializes rate-limiters per client on demand. Alternatively, on startup, you could just loop through all registered API clients and create a rate limiter for each. In case the rate limiter doesn’t allow more requests (tryAcquire() returns false), then return “Too many requests” and abort the execution of the request (return “false” from the interceptor).

This sounds simple. But there are a few catches. You may wonder where the SimpleRateLimiter above is defined. We’ll get there, but first, let’s see what options we have for rate limiter implementations.

The most recommended one seems to be the Guava RateLimiter. It has a straightforward factory method that gives you a rate limiter for a specified rate (permits per second). However, it doesn’t accomodate web APIs very well, as you can’t initialize the RateLimiter with a pre-existing number of permits. That means a period of time should elapse before the limiter would allow requests. There’s another issue – if you have less than one permit per second (e.g. if your desired rate limit is “200 requests per hour”), you can pass a fraction (hourlyLimit/secondsInHour), but it still won’t work the way you expect it to, as internally there’s a “maxPermits” field that would cap the number of permits at much less than you want it to. Also, the rate limiter doesn’t allow bursts – you have exactly X permits per second, but you cannot spread them over a long period of time, e.g. have 5 requests in one second, and then no requests for the next few seconds. In fact, all of the above can be solved, but sadly, through hidden fields that you don’t have access to. Multiple feature requests have existed for years now, but Guava just doesn’t update the rate limiter, making it much less applicable to API rate-limiting.

Using reflection, you can tweak the parameters and make the limiter work. However, it’s ugly, and it’s not guaranteed it will work as expected. I have shown here how to initialize a Guava rate limiter with X permits per hour, with burstability and full initial permits. When I thought that would do, I saw that tryAcquire() has a synchronized(..) block. Will that mean all requests will wait for each other when simply checking whether allowed to make a request? That would be horrible.

So, in fact, the Guava RateLimiter is not meant for (web) API rate-limiting. Maybe keeping it feature-poor is Guava’s way of discouraging people from misusing it?

That’s why I decided to implement something simple myself, based on a Java Semaphore. Here’s the native implementation:

public class SimpleRateLimiter {
    private Semaphore semaphore;
    private int maxPermits;
    private TimeUnit timePeriod;
    private ScheduledExecutorService scheduler;
 
    public static SimpleRateLimiter create(int permits, TimeUnit timePeriod) {
        SimpleRateLimiter limiter = new SimpleRateLimiter(permits, timePeriod);
        limiter.schedulePermitReplenishment();
        return limiter;
    }
 
    private SimpleRateLimiter(int permits, TimeUnit timePeriod) {
        this.semaphore = new Semaphore(permits);
        this.maxPermits = permits;
        this.timePeriod = timePeriod;
    }
 
    public boolean tryAcquire() {
        return semaphore.tryAcquire();
    }
 
    public void stop() {
        scheduler.shutdownNow();
    }
 
    public void schedulePermitReplenishment() {
        scheduler = Executors.newScheduledThreadPool(1);
        scheduler.schedule(() -> {
            semaphore.release(maxPermits - semaphore.availablePermits());
        }, 1, timePeriod);
 
    }
}

It takes a number of permits (allowed number of requests) and a time period. The time period is “1 X,” where X can be second/minute/hour/daily – depending on how you want your limit to be configured – per second, per minute, hourly, daily. Every 1 X a scheduler replenishes the acquired permits. There is no control for bursts (a client can spend all permits with a rapid succession of requests), there is no warm-up functionality, there is no gradual replenishment. Depending on what you want, this may not be ideal, but that’s just a basic rate limiter that is thread-safe and doesn’t have any blocking. I wrote a unit test to confirm that the limiter behaves properly, and also ran performance tests against a local application to make sure the limit is obeyed. So far it seems to be working.

Are there alternatives? Well, yes – there are libraries like RateLimitJ, which uses Redis to implement rate-limiting. That would mean, however, that you need to setup and run Redis. Which seems like an overhead for “simply” having rate-limiting.

On the other hand, how would rate-limiting work properly in a cluster of application nodes? Application nodes probably need some database or gossip protocol to share data about the per-client permits (requests) remaining? Not necessarily. A very simple approach to this issue would be to assume that the load balancer distributes the load equally among your nodes. That way you would just have to set the limit on each node to be equal to the total limit divided by the number of nodes. It won’t be exact, but you rarely need it to be – allowing 5-10 more requests won’t kill your application, allowing 5-10 less won’t be dramatic for the users.

That, however, would mean that you have to know the number of application nodes. If you employ auto-scaling (e.g. in AWS), the number of nodes may change depending on the load. If that is the case, instead of configuring a hard-coded number of permits, the replenishing scheduled job can calculate the “maxPermits” on the fly, by calling an AWS (or other cloud-provider) API to obtain the number of nodes in the current auto-scaling group. That would still be simpler than supporting a Redis deployment just for that.

Overall, I’m surprised there isn’t a “canonical” way to implement rate-limiting (in Java). Maybe the need for rate-limiting is not as common as it may seem. Or it’s implemented manually – by temporarily banning API clients that use “too much resources.”

 

 

 

 

Top