History Cleanup Is Causing The DB To Grow, Not Shrink

Charles_Tassell · November 14, 2022, 6:50pm

Hello,

I’m not very familiar with Camunda, so my apologies if I’m missing some important data here. Our devs use Camunda to log data in our Kubernetes cluster, and we’ve been having some problems with the history cleanup script. The job was deadlocking a lot and having to roll back, so we lowered the queue size from 500 to 200 which seems to have resolved that issue, but the DB still isn’t shrinking. In fact after the cleanup job runs in the morning the database is actually growing exponentially (ie, we’ll get 100K new records one day, then the next 200-300k new ones, then it gets up into the millions per day.) Note that these records are appearing after the cleanup process runs, so they are not just more traffic hitting the application, it seems to be the cleanup itself is causing the tables to grow instead of shrink.

Any ideas as to what could be causing this?

Enrico · November 15, 2022, 7:52am

Hi Charles, welcome to the Camunda community !
What do you mean with history cleanup script? Are you talking about the cleanup functionality in the Camunda Cockpit application, the cleanup through the rest apis, or some scripts your developers created?

Anyway, it’s strange indeed, I don’t see how such operation can make the database to grow.

Can you be a bit more specific on the tables that you notice growing after the cleanup?

Is it possible that other bpmn processes are executed after the cleanup, and they are the cause of the db growth?

Thanks

Charles_Tassell · November 15, 2022, 7:50pm

Hi Enrico,
This is where my lack of knowledge of the backend process is going to hurt. I believe it’s the scheduled cleanup functionality. We have a window between 4am-8am local time that it’s “configured” to run and remove the old history records. I put configured in quotes because from what I see it runs whenever it feels like. Unfortunately I don’t have access to the cockpit GUI to check to see when it’s actually going, only the logs it dumps to kubernetes.

I have a script running that uses the MySQL information_schema table to grab some stats every hour on the table sizes. Here is what I’m seeing during (what I assume is) the cleanup process. Third column is table size in GB, fourth is the number of records:
20221114-21.log:ACT_HI_VARINST 90 46332022
20221114-22.log:ACT_HI_VARINST 90 46332022
20221114-23.log:ACT_HI_VARINST 98 59615038
20221115-00.log:ACT_HI_VARINST 98 59615038

20221114-21.log:ACT_GE_BYTEARRAY 73 28049247
20221114-22.log:ACT_GE_BYTEARRAY 73 28049247
20221114-23.log:ACT_GE_BYTEARRAY 84 8499174
20221115-00.log:ACT_GE_BYTEARRAY 84 8499174

20221114-21.log:ACT_HI_ACTINST 46 20196292
20221114-22.log:ACT_HI_ACTINST 46 20196292
20221114-23.log:ACT_HI_ACTINST 51 23998793
20221115-19.log:ACT_HI_ACTINST 51 23998793

The -XX.log is the hour, and we are UTC-0:400 so that 22:00-23:00UTC corresponds to 8-9pm localtime. The devs swear that they have the job configured for am not pm though… Since I’m pulling the data from the MySQL information_schema it might just be that those stats are only updating once a day, but it doesn’t seem likely to me…

Charles_Tassell · November 16, 2022, 3:32am

It turns out it is that the MySQL information_schema table does only update once or twice a day, so that’s why it seems to only be picking up the changes at odd hours. We’ve implemented a manual cleanup script for the large history tables as a temporary solution, but I am still curious as to how I can track down what task is doing this. The devs are telling me that the community edition of Camunda that we are using doesn’t have any logging of scheduled tasks/jobs.

Enrico · November 16, 2022, 9:58am

Hi Charles,
oh yes, history management is a subscription feature, no wonder your devs had to create scripts to clean the tables themselves.
If the scripts are done correctly I’d expect the mentioned tables

ACT_HI_ACTINST

ACT_GE_BYTEARRAY

ACT_HI_VARINST

to be cleaned between 4-8 a.m, that translates, if I’m doing math well between *-06.log *-10.log

On the other hand, the increase volume of these tables, means that there are bpmn process running that are writing data to the database.
For your info:
ACT_HI_ACTINST: stores the activity instances data in the history
ACT_HI_VARINST: stores the process variables data in the history
ACT_GE_BYTEARRAY: stores complex process variables.

I’d suggest you (your devs) to have a look at our best practices in terms of data management here: