Dear all,
We currently have an application running on Camunda 7 where we are required to retain historic data for over 10 years due to regulatory compliance. Our reporting relies on the act_hi_taskinst and act_hi_procinst tables to analyze historic user tasks. Given our low process transition load, performance has not been an issue.
With the migration to Zeebe on the horizon, we need a viable strategy to preserve and manage this historic data. We are considering the following two approaches:
Approach A:
- Maintain a reduced, consolidated version of act_hi_taskinst and act_hi_procinst containing only the necessary columns.
- Build a feeder mechanism to transfer data from Zeebe’s Elasticsearch indices into this new archive table.
- Adjust reporting to utilize this new archive.
Approach B:
- Import act_hi_taskinst and act_hi_procinst data into Elasticsearch, aligning with Zeebe’s existing index structure.
- Adapt reporting to rely directly on Elasticsearch indices.
We have concerns regarding approach B:
- Since the Elasticsearch index structure is not under our control, writing data into it may introduce inconsistencies if we get the semantics wrong and import rubbish.
- Frequent changes to the index structure might require constant adjustments to reporting, increasing the risk of producing inaccurate reports.
- Zeebe’s recommended retention period for Elasticsearch indices seems very short (a few weeks, IIRC), which may not be suitable for our long-term data retention requirements. We would appreciate insights into why such short retention times are recommended and the potential risks of extending them.
Approach A seems more straightforward, yet it still requires us to closely monitor changes in Zeebe’s Elasticsearch structure to ensure the accuracy of imported data.
Our questions:
- Do our considerations make sense, or are we fundamentally misunderstanding Zeebe’s approach to historic data?
- Is it Zeebe’s intended approach that long-term historic data should be stored outside of Elasticsearch, in a separate system not managed by Zeebe?
- If the Elasticsearch indexes are suitable for long-term historic data: In both approaches we need to make sure, that we know about the innards of the elasticsearch indexes for reading them in stable fashion. But in approach B we’d also need to know how to write them, to be able to add the existing historic data. But maybe this issue has already been solved: Is there an existing import module for historic camunda 7 data (especially act_hi_taskinst and act_hi_procinst) to elasticsearch?
Regards,
André