Hi, I’m planning cleanup of the history data in a V 7.9 Camunda setup that has been running for a long time without history data cleanup. It has currently the history data from about 4M process instances.
Our goal is to run the cleanup in time windows when there are not many other processes running. We are trying not to overload the single node MariaDB that we have. The history variable tables are so big that indexes don’t fit in memory cache etc. (Btw, I’m really not a DB expert). It doesn’t really matter if the cleanup takes a longer time if we can avoid overloading the database server.
My question is that, in this kind of scenario, what would be the effect of historyCleanupBatch size? Should we use the default 500 or should we choose a smaller value?
(Of course, we set historyCleanupDegreeOfParallelism=1. I.e. we don’t run multiple cleanup jobs in parallel.)
Ideas for different approaches or other related info are welcome too.
Hi @opsiikarla - my initial thought is that the default of 500 is a good starting point, assuming you set the cleanup window for a time where user load on the system is minimal.
The longer answer is that you will likely need to gather and provide some more data to know for sure. With the single node MariaDB, do you have historical charts on CPU load over time? Do you know for sure what peak user load looks like on the system and what time it occurs? Do you see API response times increase with this load? Without this information, it’s difficult to know for sure, but I think the defaults are a good starting point that you could always change if you observe too much load.
In general though, I’m not sure adjusting the historyCleanupBatch
size will affect too much. It’s really just the size of the batch. Likely the cleanup will max out the job executor and run multiple batches and you will observe a spike in CPU while the cleanup is running. Camunda does a good job of not overloading the system, regardless of what you configure, but it will likely spike the CPU on your first few runs given the size of your history tables. I personally, have seen batch jobs run on extremely large Camunda databases where the CPU spikes to ~80% and remains there for the duration of the job. This is expected as the goal is to clean up this data as fast as possible within the constraints of the DB configuration.
Jwarren, Thank you for your answer.
Just to understand the concept of ‘batch’ : Let’s assume that we use the default size 500 and configure the cleanup to run for four hours. Does the cleanup work so that it will launch a batch job to clean up 500 objects from the history and when that is done it will launch another batch and so on. Until the end of the for hour time window.
If that is the case, I agree that we should start with using the default 500.
That is correct @opsiikarla. Likely there will be multiple batches (of 500, or whatever you configure) created during the four hour time window example. Once it finishes a batch, it will create another and continue the cleanup until the end of the time window. More details on how these batches work can be found here.
Thank you for your help @jwarren
Maybe the main issue was that I wasn’t sure what ‘batch’ means in the context of historyCleanupBatch parameter. It seems to be clear now.
You mentioned that we need information about response times, peak times, cpu loads and such. I think we have enough of that information so that we can start cleaning up in a safe way now that I have better understanding of how these cleanup batches work. Thank you very much!