Zeebe broker partition backpressure

Hi everyone,

We have a cluster of 3 Zeebe brokers v8.0.2 with 2 Zeebe Gateways in front of them. The brokers are configured with 3 partitions and 3 replicas. we monitor the cluster using the Grafana dashboard provided by Camunda.
Every thing seems fine until one or two weeks of worflows running. At some point we can see that one of the three partition is having backpressure Dropping requests as follow:

When we are such a situation, the workflows are slower than before. We want to understand why only one partition is saturated and what can we do solve it.

Any advises are welcome…

Thanks in advance,

Stephane

How are you starting the instances?

Hi jwulf,

The instances are started on reception of a message. Some instances will always be scheduled to run on a timely bases every two hours and will never be as Completed. Please just have a look to the figure below for a clearer understanding.

Thanks in advance,

Stephane

Not sure if this has changed, but previously, the message start event subscription for a process model is opened on a specific partition. So if you are just starting that one process instance via messages, they will all start on the same partition.

Hey @rancurel, facing similar issue here. were you able to resolve it? Here is the link to the issue i’ve posted [https://forum.camunda.io/t/zeebe-backpressure/49529[issue]

Also, @jwulf will you be able to help me out here?

Thanks in advance!