We are using Camunda BPM in production for the last eight months, and we found that the data in the Camunda history and runtime tables is growing bigger. We have read about purging the history data on Camunda docs and performed the clean-up. Therefore, the history tables seem to be in control.
However, with the growing active instances and new business processes being onboarded in our same system, we are concerned about runtime table data (specially ACT_RU_VARIABLE)
Are there any suggestions or thoughts on how can we handle these runtime tables?
Also, Is there some benchmark for the maximum number of process instances and max variables beyond which Camunda will start slowing down?
Attaching here the table data count
Generally you shouldn’t need to worry about the runtime tables.
After a process has been completed all of the data associated with it in the runtime tables will be removed. so it’s never going to just keep getting bigger and bigger like the history tables (which will not remove anything by default).
If you notice the runtime tables getting bigger that is simply a consequence of high throughput in the engine and if indeed it is the case that you’re starting a lot of instances then it should be considered expected behavior.
Thanks, @Niall for the prompt response. We have long-running processes that could span months so the data will be in the tables for a long time because of many instances being active in parallel. So, the problem is
We have approx. 60 variables per process
have many processes
the processes are long-running
These three things together are leading to the growth of the variable table.
There is no way of cleaning or optimizing data for active process instances from Camunda side. All those processes will continue to eat space on runtime tables until they are finished executing.
Seeing the numbers in the attached photo I can tell these:
There are 53305 active tasks which is not big enough to cause performance issues. Camunda can easily handle that. You shouldn’t have performance problems. Do you have?
I suspect you have one big long running monolith process which can be simplified by splitting into subprocesses. If this is the case, you can optimize this process by splitting it into logical subprocesses and models. That way whenever intermediate process finishes it’s data will be removed from runtime tables.
I don’t know how big the model is, maybe 60 variables is fair number. But anyway try to make optimizations on your side regarding the variables. Think which one of them you can eliminate or move to backend logic instead of keeping in the model.