Fixing Production issue with @Async in Spring Boot

Fixing Production issue with @Async in Spring Boot

Photo by Clint Patterson on Unsplash

I came across this recurring issue in production which I wanted to share .

Premise

We have a web application wherein we have provided the admins the feature to generate reports . Through the UI interface , they can apply various filters on an entity and click on Export CSV . Once the button is clicked , a popup is shown that says “Your report is under processing and will be sent to your mail shortly”.

Under the hood , what happens is that we have a controller which accepts the request and passes it to a service which is wrapped under Async block . Since it is asynchronously processed in background , the controller immediately returns back with the response. Once the processing is done , it calls the notification service and sends mail with report in attachment.

Controller

REST Controller

Service

Service class

Issue

This controller is not one of the most used REST controllers in our application . It is rarely used like 2–3 times per quarter . We had deployed this service in 2020 and no issues reported in that year . But in next year (2021) , admins started complaining that they are not receiving the reports in mail . Strange ? It worked fine for a year , no code deployments after that , so what happened ? I checked the logs and found that HikariCP is not able to get a new connection and it timed out after waiting for certain time .

Error Logs

Error Logs from 2021

HikariCP configs

HikariCP configs

Immediate fix

After going through multiple resources on StackOverflow and checking the HikariCP official documentation , I increased the max pool size (This value will determine the maximum number of actual connections to the database backend) and connection Timeout(This property controls the maximum number of milliseconds that a client will wait for a connection from the pool) and restarted the pods . Voila , it solved the issue . No further complaints from admins for a year .

New HikariCP Configs

New HikariCP configs

Issue Recurrence

Its September 2022 , I get notified by admins that the report is not getting generated .That's when I try to debug the issue in deep .

Debugging the issue

Lets turn on HikariCP debug logs to find out what's really happening .

logging.level.com.zaxxer.hikari: TRACE
logging.level.com.zaxxer.hikari.HikariConfig: DEBUG

This will start emitting the logs like below:

HikariCP Debug Logs

When I hit the controller once , Hikari CP opens up one DB connection , I try hitting controller multiple times and see the Active connections count keep on increasing . Kept on hitting it until the Connection is not available , timed out issue happened . HikariCP has reached the maximum pool size and is not able to create a new connection .

After cleanup stats (total=50, active=50, idle=10, waiting=0)

I was expecting that after one async thread finished , another async thread would reuse existing opened connection for processing, but it was not happening .Each async thread was keeping the DB connection active and not returning it to HikariCP pool for reuse . Hence HikariCP had to always open up a new connection for each async thread until it hit the max pool size limit and started complaining that it is not able to open new connection, and time out effectively blocking any new reporting requests .

Awesome , I found the issue . How to fix it then ?

Added @Transactional in function

Added Transactional block (read only because there are no inserts/updates happening in the function) . Transactional behind the scenes ensures that DB connection is released by thread once the processing is completed. After applying this , I tried hitting the controller multiple times using 20x Load and found that it sustained the entire load.