Worker contact point in a clustered environment

Hi @spulci,

thanks for trying out Zeebe.

First of all if you want to setup a cluster you should choose a odd count of nodes, otherwise you have no real benefit of it. This means you should use as node count 3 or 5 or 7 etc.

The reason for this is because we use an RAFT implementation as consensus protocol to provide consistency. In RAFT you need a quorum to commit events and make configuration changes. A quorum is defined as quorum = (nodeCount / 2) + 1. This means if you have an even node count like 2 you will have a quorum of 2. If now one of the nodes goes down you can’t reach your quorum so in the end you can’t commit any events (you have a fault tolerance of zero). So to get reliability you need to have at least 3 nodes, where you have still a quorum of 2. If then one node fails the cluster is still available (you have a fault tolerance of one).

How can the worker app contact a different node if node 0 is down?

The Zeebe client (the worker is part of it) contacts the gateway (embedded or standalone) and this gateway routes then the requests to the different brokers and partitions depending on the topology.
So you don’t need to use the topology in the client for this.

Even if the gateway can connect to the remaining broker with your setup it can’t activate further jobs, because there is no quorum.

Hope this helps?
Do not hesitate to ask more questions about this.

Greets
Chris

1 Like