Zeebe broker partition backpressure

Hi everyone,

We have a cluster of 3 Zeebe brokers v8.0.2 with 2 Zeebe Gateways in front of them. The brokers are configured with 3 partitions and 3 replicas. we monitor the cluster using the Grafana dashboard provided by Camunda.
Every thing seems fine until one or two weeks of worflows running. At some point we can see that one of the three partition is having backpressure Dropping requests as follow:

When we are such a situation, the workflows are slower than before. We want to understand why only one partition is saturated and what can we do solve it.

Any advises are welcome…

Thanks in advance,

Stephane

How are you starting the instances?

Hi jwulf,

The instances are started on reception of a message. Some instances will always be scheduled to run on a timely bases every two hours and will never be as Completed. Please just have a look to the figure below for a clearer understanding.

Thanks in advance,

Stephane