Camunda 8 - Cross cluster replication of Zeebe data

Hi Guys,
As a requirement for a project, I need to setup 2 different Camunda8 cluster in SaaS, one for master and the other for DR.
I need that all data in DR cluster are replicated in real time from the master (with near zero RPO).
Is there a way to do this?

PS: stretched cluster or geo redundancy is not an option for me. I need that DR cluster could work on its copy of data.

Regards

Hi @Nassiesse - I will ask around, but as far as I’m aware you cannot do this sort of real time replication. Zeebe clusters are fault tolerant by default, however, because of how they can be partitioned and scaled horizontally (see architecture description). In order to achieve that near real time replication, all of the brokers would need to be instantaneously exporting all their data not just through the exporter (by default to Elasticsearch), but also to its mirror broker in an entirely independent cluster. This isn’t really how Zeebe was designed.

I’m curious to learn more about why you can’t geographically distribute your brokers to maintain availability? (Or is this more of an experimental project?) I’d love to take this feedback back to the engineers and get their thoughts. Any details you can share would be great!

Hi Nathan,
thanks for you response.
it’s a technical requirement from a client of us, that are working to adopt Camunda 8 in its infrastructure.

They need to have two different cluster (or tenant) one for production and one for disaster recovery.
These two cluster must be synchronized in real time.

The solution could be on SaaS or self-managed.
I didn’t find any best practice to sync two cluster/tenant.

Do you have any ideas?

@Nassiesse - I double checked with the engineers and they confirmed what I originally said: this is not possible with Zeebe. The current recommendation, and one that many people are using, is to geographically distribute your brokers.

The engineering team has discussed the idea of “passive brokers” before, where the data is replicated but they don’t partake in the quorum/election for the consensus protocol. This is not in active development, just something that has been discussed. However, if you can provide some more detailed reasons why this would be a beneficial feature, I’d love to share that with the team. Is there a specific reason for the client to need this, or is it just because of an internal policy that they cannot amend?

Yep, It’s an internal policy.

In my opinion frequent backups should be a possible solution (even if it’s not the best).
Do you have some kind of script out-of-the-box to perform a full backup/restore of all data?
In official documentation I can find the list of steps for backup, but a ready-to-use script could be useful.

Regards

There isn’t a script for automating backups that I’m aware of. That’s a great idea and I’ll pass along the feedback!