Syncing zeebe stream if all the clusters are down

For Zeebe setup, I wanted to know how would the events stream be synced or populated if all the clusters (or brokers) are down and then bought up.
In this case, there would not be any stream from where the syncup would happen.

Hi @aasthana4 :wave: That’s an interesting question.

Zeebe is a multi-raft, build on a fork of atomix. Each broker has one or more partitions. Each partition has a log stream. This log stream is persisted on disk. When all brokers are starting up, the raft consensus protocol (i.e. leader election) is used for each partition to determine which broker will lead it. The others will be followers, replicating the log stream for failover. This election protocol favors the broker with the most up-to-date log stream. For more details on the raft leader election, please have a look at the raft paper.

1 Like

Thanks @korthout for the response.
I wanted to know if the log stream can be replicated from elasticSearch or other persistent sources than the disk? I have a restriction on my clouds for persistent disk. So have to go with persistenceType=local or memory ( ref: camunda-platform-helm/charts/camunda-platform at main · camunda/camunda-platform-helm · GitHub text - persistenceType).
With the kind of setup i will be doing, I am covered till any one of the pod is up to replicate.
Wanted to know what can be the backup strategy if all my pods come down.

Hi @aasthana4

I understand your idea for ES, but this won’t work. Let me try to explain:

When running with persistenceType=local or memory you remove the fault-tolerance capability when all brokers are down. A restart would start with an empty log, as you’ve already discovered.

There’s currently no way to rehydrate a Zeebe partition from an outside log source. So exporting everything to ES doesn’t allow you to re-import it as a backup. Therefore, running with this persistenceType is a dangerous way of running Zeebe. You forgo the fault-tolerance when all brokers for a partition are down.

Having said that, you could consider experimenting with rolling your own backup system if you use some ramdisk that potentially could be copied and saved for later use. To do that you’d need to:

  • make sure all partitions have paused processing using the partitions admin endpoint. This is necessary because distributed hot-backups are a complex topic, so stopping all processing makes things easier. This admin endpoint is a bit hidden, but you can do stuff like:
# pause processing on partition 1
curl http://localhost:9600/actuator/partitions/pauseProcessing -d '{"partitionId":1}' --request POST --header 'Content-Type: application/json' | jq

# resume processing on partition 1
curl http://localhost:9600/actuator/partitions/resumeProcessing -d '{"partitionId":1}' --request POST --header 'Content-Type: application/json' | jq
  • make a copy of your persistentDisk (ramdisk or so) in your cloud environment (whether this is even possible is probably dependent on your cloud provider).

Generally, I’d recommend not to use this persistentDisk=memory stuff if you have persistence requirements. It’s kinda defeating the purpose IMO.

Hope this helps :slight_smile:

Thanks @korthout , understood.
Will look for the solutions in place to move forward