Camunda history data cleanup blog - github link says error 404

Sneha_Patel · October 14, 2020, 12:06pm

I was going through the camunda best-practices article on cleaning-up historical data Overview | Camunda Platform 8 Docs and in the last section of the page Overview | Camunda Platform 8 Docs the second bullet point " Copy/Move data via SQL into another database or schema, where it will be kept." there has been a mention of example Hamburger Sparkasse (Haspa). In this blog there is a github link https://github.com/camunda/camunda-consulting/tree/master/snippets/clean-up-history. When I try to access this link it gives error 404. Could you kindly explain why this has been removed.

Thanks,
Sneha

Ingo_Richtsmeier · October 14, 2020, 5:17pm

Hi @Sneha_Patel

The snippet was an early workaround for the times where the current history clean up was not available.

It required huge maintenance effort as it was tightly coupled to the database schema of the ProcessEngine and worked only for Oracle. It has to be adopted for every version.

So we decided to remove it.

Hope this helps, Ingo

Sneha_Patel · October 15, 2020, 4:32pm

Thanks a lot @Ingo_Richtsmeier for the quick response and clarification.

We are using Camunda for handling complex business processes which are spanning across multiple microservices. The orders volumes are very high and high amount data is being generated and we need to keep this data as it is required for the investigation purpose.

We understand that have to perform the history-cleanup to cleanup the data regularly. But looking at the volumes we would need to clean-up the history db and keep this data some kind of archive db so that it can be referred to even after TTL/removalTime.
It would be good if the history db can be separated from the runtime db(not sure how this can be done) so that there would be no impact on the performance of the running orders.

I understand from the documentation if we go by the Camunda out-of-the-box history cleanup options there are two strategies available

Removal-Time-based Strategy
End-Time-based Strategy

For our case there are long running orders(running for 3 to 6 months or even more) and there will be multiple process microservices participating in for fulfilling the order. There will be a “main orchestrator microservice” which will generate events to invoke other microservices (which also have embedded camunda bpm processes).
It might happen that for a given order “microservice 1” has finished execution and is now eligible for the cleanup but the “microservice 2” for the same order/buisinessKey is still running. So now when the clean job runs all the data corresponding to “microservice 1” gets purged which we do not want. The reason is if someone searches by that order number/businessKey in the cockpit ,then he/she cannot get a view on the part of business process handled by “microservice 1”.

Need your suggestion on what strategy we should use here - The removal end-time startegy would work well for the “main orchestration microservice” as it would always be the last one to finish for the given order but this might not work well for other microservices (as I explained in the above paragraph)

We have Optimize but it does not give a detailed view similar to the Cockpit. So what we are looking for is the separate history db(i.e separate from runtime db) to get the maximum performance benefits and at the same time use Cockpit for archive db/separate history db (so that we do not need to develop Custom UI) to view the processes information.

Thanks,
Sneha