Migration vs retention policy vs act_hi_taskinst / act_hi_procinst

Dear all,

We currently have an application running on Camunda 7 where we are required to retain historic data for over 10 years due to regulatory compliance. Our reporting relies on the act_hi_taskinst and act_hi_procinst tables to analyze historic user tasks. Given our low process transition load, performance has not been an issue.

With the migration to Zeebe on the horizon, we need a viable strategy to preserve and manage this historic data. We are considering the following two approaches:

Approach A:

  • Maintain a reduced, consolidated version of act_hi_taskinst and act_hi_procinst containing only the necessary columns.
  • Build a feeder mechanism to transfer data from Zeebe’s Elasticsearch indices into this new archive table.
  • Adjust reporting to utilize this new archive.

Approach B:

  • Import act_hi_taskinst and act_hi_procinst data into Elasticsearch, aligning with Zeebe’s existing index structure.
  • Adapt reporting to rely directly on Elasticsearch indices.

We have concerns regarding approach B:

  • Since the Elasticsearch index structure is not under our control, writing data into it may introduce inconsistencies if we get the semantics wrong and import rubbish.
  • Frequent changes to the index structure might require constant adjustments to reporting, increasing the risk of producing inaccurate reports.
  • Zeebe’s recommended retention period for Elasticsearch indices seems very short (a few weeks, IIRC), which may not be suitable for our long-term data retention requirements. We would appreciate insights into why such short retention times are recommended and the potential risks of extending them.

Approach A seems more straightforward, yet it still requires us to closely monitor changes in Zeebe’s Elasticsearch structure to ensure the accuracy of imported data.

Our questions:

  1. Do our considerations make sense, or are we fundamentally misunderstanding Zeebe’s approach to historic data?
  2. Is it Zeebe’s intended approach that long-term historic data should be stored outside of Elasticsearch, in a separate system not managed by Zeebe?
  3. If the Elasticsearch indexes are suitable for long-term historic data: In both approaches we need to make sure, that we know about the innards of the elasticsearch indexes for reading them in stable fashion. But in approach B we’d also need to know how to write them, to be able to add the existing historic data. But maybe this issue has already been solved: Is there an existing import module for historic camunda 7 data (especially act_hi_taskinst and act_hi_procinst) to elasticsearch?

Regards,
André

Hi Andre.
Thanks for the detailed question! Quite an interesting question. We have our annual company kick off this week - so there is not much time to answer. But I wanted to quickly throw in one possibility. We are investing in

  • RDBMS support (so that you can push C8 history also to a RDBMS instead of Elastic as an alternative). We are also investing in
  • Migration tooling, which will also be able to migrate history/audit data from Camunda 7 to Camunda 8 - but this might be limited to RDBMS environments (not finally decided).

We currently finalize our plans and I hope to be able to blog about it soon to give you more context and direction - how urgently do you need an answer here?
Thanks in advance
Bernd

Hi Bernd,

thank you for your answer. I saw the announcements in the documentation that there will be some new tooling maybe in 8.7. I think by the end of March is perfectly fine. So, if there is any chance of a blog post being around until then, we are good.

Thank you and best regards,
André

Hi Andre. I wondered if you saw the latest migration strategy update: Migrating Solutions from Camunda 7 to Camunda 8—A Strategy Update | Camunda? This also links the completly revamped migration guide: Camunda 7 to Camunda 8 migration guide | Camunda 8 Docs. I hope this is helpful?

Hi Bernd, ah, thank you, I will look into it!