I am having a look at our camunda database and noticed that our
act_ge_bytearray table is getting extremely large (450Gb +). We make use of file uploads within our process and I believe these files are being stored as blobs within this table, is this the correct assumption?
It is not required that the file uploads are stored for extended periods of time, once the process is complete we can remove the blobs, this doesn’t seem to be occurring automatically as with the
ru_ tables. Is there a way of configuring camunda to remove the data once a process is complete or would it be required to write a cleanup SQL script to clear some of this data as a maintenance tasks?
I also had a look at the history cleanup REST endpoint but it doesn’t seem to clean up this data.
The files we upload are also stored on disk and the size on disk is not nearly as large as it is within the database, are there perhaps any other factors that would increase this table size?
Any suggestions on what would be the correct/best approach to keep this table to a reasonable size by cleaning up “historic” data?
your assumption is correct.
You can save disk space by keeping the file variables out of the history tables. Have a look at this example: https://github.com/camunda/camunda-bpm-examples/tree/master/process-engine-plugin/custom-history-level.
Another option would be to set the history time live of your process model and clean the history: Have a look at the overview: https://docs.camunda.org/manual/7.11/user-guide/process-engine/history/#history-cleanup and the specifics: https://docs.camunda.org/manual/7.11/user-guide/process-engine/history/#history-time-to-live
Hope this helps, Ingo
Thanks for the reply @Ingo_Richtsmeier. This is exactly what we need. Just to confirm, when the history cleanup happens will it also cleanup the
ACT_GE_* tables or does it only perform the actions on the
ACT_HI_* tables? The reason I am asking is it is mainly the
act_ge_bytearray table which is so large.
I would be interested in this answer.
Hi @royal and @some-camunda-user,
the history cleanup deletes all entries of ACT_GE_BYTEARRAY_ that belong to historic process instances.
Hope this help, Ingo
Thanks for the response, this clears things up. I am quite interested in why ours are so big. We do allow file uploads within our process but the size seems almost double/trippel as what I was expecting it to be. Do you perhaps know if you pass all variables from a parent process down to a child process if those variables will be duplicated in the table and linked to the child process, or will the child process reference the variable of the parent process?
passed variables in call activities are copied: https://docs.camunda.org/manual/7.11/reference/bpmn20/subprocesses/call-activity/#passing-variables.
To save database space you could add a service task to save the file in a document management system and reference the ID afterwards (and remove the file from the database). See https://blog.camunda.com/tags/dms/ for some details.
The file size of the blob table depends on the database you use. I’ve heard from an Oracle admin, that they won’t pack all blobs directly one after the other but spread them over some diskspace. Other database servers may have other strategies to save blobs.
Hope this helps, Ingo
We will definitely need to consider moving the documents out of camunda.