After Backup and restore I can not create process instance occasionally

I backup zeebe data by S3 storage from a 8.1.6 cluster and restored to a new 8.1.6 cluster, anything was OK. However, I got NotFound when I tried to create process instance sometimes.
The error is “rpc error: code = NotFound desc = Command ‘CREATE’ rejected with code ‘NOT_FOUND’: Expected to find process definition with process ID ‘f_1647783962028273664’, but none found”.
With another try it might work. It seemed that some broker was in a wrong state but still running…

Some broker’s log is like:

2023-04-19 17:34:34.582 [Broker-1-SnapshotDirector-24] [Broker-1-zb-actors-1] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 2552783 is committed, current commit position is 0. After that snapshot will be committed.
2023-04-19 17:36:34.533 [Broker-1-SnapshotDirector-7] [Broker-1-zb-actors-7] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 2553001 is committed, current commit position is 0. After that snapshot will be committed.
2023-04-19 17:36:34.543 [Broker-1-SnapshotDirector-18] [Broker-1-zb-actors-0] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 2552313 is committed, current commit position is 0. After that snapshot will be committed.
2023-04-19 17:36:34.558 [Broker-1-SnapshotDirector-6] [Broker-1-zb-actors-5] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 2552003 is committed, current commit position is 0. After that snapshot will be committed.
2023-04-19 17:39:34.591 [Broker-1-SnapshotDirector-24] [Broker-1-zb-actors-6] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 2552783 is committed, current commit position is 0. After that snapshot will be committed.

I followed the step of Backup and restore | Camunda Platform 8 Docs but I only backup and restore zeebe data。 Does it matter?

The backup cluster and restore cluster are both with 24 partitions, 6 nodes and 3 replicas. The partitions of the new cluster are not insistent with the old. Half partitions were started with a new commit number.


The distribution are strange in the new clusters.

This is the distribution of partitions in the old cluster which seems in a correct state.

Solved it by setting ZEEBE_BROKER_CLUSTER_NODEID environment variable

1 Like