I recommend this "Stop Rate Limiting! Capacity Management Done Right" talk to get your head around the problem domain: https://www.youtube.com/watch?v=m64SWl9bfvk
TL;DR; of the video:
- Key takeaway: Limit concurrency, not request rate
- Throughput != capacity
- It's really hard to know the true capacity of a service
- Learned about Little's Law https://en.wikipedia.org/wiki/Little%27s_law
Points on the Implementation Above
- This RateLimitFilter is at the very top of the stack (aka it's registration order is zero) so that no unnecessary throw away work is done in the case of a service denial
- It is registered in our WebApplicationConfig class via an @Bean method that returns a FilterRegistrationBean (see image below)
- The Semaphor acts as a "Leaky Bucket" in queue theory
- The Guava RateLimiter acts as the mechanism to keep requests per subdomain flowing at a consistent rate (which can cause the Semaphor to "fill up" and deny requests
- The filter sends a HTTP Error 429 if there are too many concurrent requests. Be sure you have an external integration test against a live environment that can actually produce 429 failures!
- Our production cluster is 3 VM's, so per tenant it effectively allows 30 concurrent requests at a stable rate of 12 requests per second per tenant.
This is in our WebApplicationConfig class:
Why I put the Api Throttling inside Spring Boot app vs somewhere "above the application"
- I wanted the "origin service" (aka our Spring Boot app) to have some "built in" self protection even if we eventually better protection "above the app"
- Since the implementation is per subdomain (aka tenant), I have all the control right in the code base for easy tweaking/deploying later
- We want to tie the # of Api Requests into a billing plan down the road. My assumption was this would be easier to do inside Spring Boot.
That's all I have for now. What do you think? I'd love to hear your feedback on this implementation and/or how you have solved this problem at your organization.