Zeebe Broker Backup

sbirrer · September 13, 2019, 11:49am

We are looking into the use of a Zeebe Cluster on a Openshift/Kubernetes Cluster.

What will be the right way or “Best Practice” for Backup/Restoration of the Data, which is on the Zeebe Cluster?

Is it enough to copy the Data of each running Instance? Or do we have consider something more?

Thanks in advance
Severin

jwulf · September 14, 2019, 10:05am

Hi Severin, you run a fault-tolerant cluster with replication. The cluster nodes replicate state between them. When a node fails, no data is lost.

You can export historical data to Elastic Search. This data can be used for analysis, but cannot be used to rehydrate cluster/broker state.

There is no other concept of a “back-up” in Zeebe.

What is the failure scenario you want to address?

Josh

sbirrer · September 16, 2019, 6:27am

Hi Josh
The failure Scenario is a Desaster Recovery Backup. In case a complete cluster will fail, or there was a human error, like application bugs.

Severin

salaboy · September 20, 2019, 8:42am

@sbirrer you can always create an exporter of events (as the one mentioned by “The Real @jwulf” ) and then back them up in any preferred storage that you use for disaster scenarios. Does that make sense? If you already have a defined procedure in your company, you can follow the same approach by connecting the Zeebe cluster with those tools using a Custom Exporter.

walt-liuzw · December 20, 2019, 6:22am

hi @salaboy
if i re-deploy a zeebe cluster using docker-compose down / up, can my previous workflow data still be called? my test result is: i need to redeploy the previous workflow file.

salaboy · December 20, 2019, 6:47am

@walt-liuzw that is absolutely true, because docker compose will automatically delete the storage associated with the broker containers. If you want to keep using the data you will need to configure a volume that doesn’t get deleted.

walt-liuzw · December 20, 2019, 6:53am

@salaboy I deployed it using the docker-compose yaml file in zeebe-docker-compose/standalone-gateway at master · camunda-community-hub/zeebe-docker-compose · GitHub.
Which configuration item do I need to modify so that Zeebe Cluster can continue to use the previous process data after redeployment?

now it throws exception:

2019-12-18 16:23:14.0430 ERROR Status(StatusCode=NotFound, Detail=“Command rejected with code ‘CREATE’: Expected to find workflow definition with process ID ‘demo-purchase-order’, but none found”)

salaboy · December 20, 2019, 2:28pm

@walt-liuzw that is the default behaviour for docker compose, it is automatically removing all the storage for the pods. You need to configure a volume so the brokers can store the data in a directory in the host system.

jwulf · December 21, 2019, 8:05am

See here:

github.com

camunda-community-hub/zeebe-docker-compose/blob/master/operate/docker-compose.yml#L19


      
          
          services:
            zeebe:
              image: camunda/zeebe:1.1.0
              environment:
                - ZEEBE_LOG_LEVEL=debug
              ports:
                - "26500:26500"
                - "9600:9600"
              volumes:
                - zeebe_data:/usr/local/zeebe/data
                - ./application.yaml:/usr/local/zeebe/config/application.yaml
              depends_on:
                - elasticsearch
              networks:
                - zeebe_network
            operate:
              image: camunda/operate:1.1.0
              ports:
                - "8080:8080"
              depends_on:

walt-liuzw · December 23, 2019, 1:09am

this is my docker-compose yaml file:
version: “2”

networks:
zeebe_network:
driver: bridge

volumes:
zeebe_data:
zeebe_elasticsearch_data:

services:
zeebe:
container_name: zeebe
image: camunda/zeebe:0.21.1
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_STANDALONE_GATEWAY=true
- ZEEBE_GATEWAY_CONTACT_POINT=node0:26502
- ZEEBE_GATEWAY_CLUSTER_PORT=26502
- ZEEBE_GATEWAY_CLUSTER_HOST=zeebe
ports:
- “26500:26500”
- “9600:9600”
volumes:
- zeebe_data:/usr/local/zeebe/data
- ./gateway.cfg.toml:/usr/local/zeebe/conf/gateway.cfg.toml
depends_on:
- elasticsearch
networks:
- zeebe_network
node0:
container_name: zeebe_broker_1
image: camunda/zeebe:0.21.1
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_NODE_ID=0
- ZEEBE_PARTITIONS_COUNT=8
- ZEEBE_REPLICATION_FACTOR=3
- ZEEBE_CLUSTER_SIZE=3
- ZEEBE_CONTACT_POINTS=node0:26502
ports:
- “26600:26500”
volumes:
- ./zeebe.cfg.toml:/usr/local/zeebe/conf/zeebe.cfg.toml
networks:
- zeebe_network
node1:
container_name: zeebe_broker_2
image: camunda/zeebe:0.21.1
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_NODE_ID=1
- ZEEBE_PARTITIONS_COUNT=8
- ZEEBE_REPLICATION_FACTOR=3
- ZEEBE_CLUSTER_SIZE=3
- ZEEBE_CONTACT_POINTS=node0:26502
volumes:
- ./zeebe.cfg.toml:/usr/local/zeebe/conf/zeebe.cfg.toml
networks:
- zeebe_network
depends_on:
- node0
node2:
container_name: zeebe_broker_3
image: camunda/zeebe:0.21.1
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_NODE_ID=2
- ZEEBE_PARTITIONS_COUNT=8
- ZEEBE_REPLICATION_FACTOR=3
- ZEEBE_CLUSTER_SIZE=3
- ZEEBE_CONTACT_POINTS=node0:26502
volumes:
- ./zeebe.cfg.toml:/usr/local/zeebe/conf/zeebe.cfg.toml
networks:
- zeebe_network
depends_on:
- node1
operate:
image: camunda/operate:1.1.0
ports:
- “8080:8080”
volumes:
- …/lib/application.yml:/usr/local/operate/config/application.yml
networks:
- zeebe_network
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.1
ports:
- “9200:9200”
environment:
- discovery.type=single-node
- cluster.name=elasticsearch
- “ES_JAVA_OPTS=-Xms512m -Xmx512m”
volumes:
- zeebe_elasticsearch_data:/usr/share/elasticsearch/data
networks:
- zeebe_network

that still throws exceptions after restarting docker-compose
2019-12-18 16:23:14.0430 ERROR Status(StatusCode=NotFound, Detail=“Command rejected with code ‘CREATE’: Expected to find workflow definition with process ID ‘demo-purchase-order’, but none found”)

jwulf · December 23, 2019, 2:02am

Did you redeploy the workflow at least once after modifying the docker-compose like this?

walt-liuzw · December 23, 2019, 7:11am

hi @jwulf
I think I have already deployed this workflow before. I just upgrade the service / restart the service. Those running workflows should not need to be redeployed.This should be a feature.

salaboy · December 23, 2019, 9:13am

@walt-liuzw that is a feature… but the problem is docker compose getting rid of your persistent storage. If you run a broker outside docker you will see that the process definitions and instances don’t go away.

jwulf · December 24, 2019, 2:17pm

If you are running a cluster like this via docker compose, you need to provide a volume for each of the brokers. You have one persistent volume mounted to the gateway, and you are running three brokers.

Do the same thing for the three broker nodes - create a volume and mount it.

The gateway doesn’t need a persistent volume. The brokers do.

anfieldleung · June 21, 2021, 9:16am

hello, we are using the persistent volume concept and data can preserved after docker compose up & down. However, we have problem pretty close to the original question on this thread, i.e. how to backup and restore the workflow status, instances, variables … etc, for whatever reason that it is needed to. e.g. disaster recovery, server migration, software upgrade …
I think it is hard to avoid a kind of backup restore process in our production for any kind of unexpected failure. We have to retain the ‘exact’ status it is without affecting our end user or requiring them to ‘re-do’ certain process again.
We tried to backup the data folder (/usr/local/zeebe/data) from container as a zip, but we have no idea how to restore the zipped files onto the new container/volume. We are not even sure such kind of backup restore is workable or not.
So what is the best practise of zeebe broker for backup & restore in production environment

jwulf · June 23, 2021, 12:10am

Clustering redundantly across hardware is the backup mechanism by design.

If you have a hardware failure, the restore process is fail-over and recovery.

Zeebe doesn’t support any other mechanism.

anfieldleung · June 23, 2021, 6:02am

Thanks, is there any more information or online document about the cluster across hardware?
Is it a hot standby? Or it is a master to master sync?

How about if we need to have server upgrade or migration? And also what should we do if we want to constantly update our staging/uat environment with production data for testing and troubleshooting?

MaximMonin · June 23, 2021, 6:05am

Use replication factor 3 or more, to different hosts. And it will be auto backup. It is raft copy logs from partition to partition. And dynamic change of leader broker for partition’s. Every broker can be master or be slave. I remeber cassandra used almost the same method.
I think we can stop cluster, do snapshot. And stop cluster + restore snapshot, as second method

mimaom · June 23, 2022, 7:11am

Hi @jwulf

A little late to the party, but I was just wondering if this also applies to the commercial Camunda 8 (Camunda cloud) setup?

Thanks
BR
Michael

jwulf · August 10, 2022, 7:49pm

Yes, it does.