How to manage the historic data?

Hi All,
The HistoryService provides the interfaces to interact with historic data. Any process instance, including unfinished, finished and deleted can be retrieved via HistoryService.

Here comes my question:

  1. If our system generates more than 100,000 process instances per year, it seems like the history table volume will keep increasing. What kind of action we have to take to optimize it? Maybe something like scheduled archiving? Any advice?

  2. There definitely are lots of deleted process instance historic data, we maybe want to clean it every 3 months. Then how can I query all the deleted process instances? It looks like historyService.createHistoricProcessInstanceQuery() does not provide a method to filter it out.

  3. If we are doing the database migration someday. If there any way we can migrate the workflow data as well? Backup & restore might not help because the database might be different.

Thanks for any help in advance.

@GhostFox for long term history we always look at using: https://docs.camunda.org/manual/7.5/user-guide/process-engine/history/#provide-a-custom-history-backend

This way the history is not part of the main database.

We have also been looking at “super” longer term storage practices (10+ years), where we are approaching millions of process instances per year.

This is a very common concern for production systems and i’d suggest you take a look a this blog post on the subject.

It will point to a consulting snippet on github that you might be able to use to manage your history better.

Thanks guys, for your always super quick and problem solving responses.

BTW, I am using Camunda grails plugin. I like to know if there any grails/groovy oriented historic data archive solution instead of dealing with native SQL statement.

We like to use grails quartz job to handle the data.

Any luck?