Zeebe broker partition backpressure

rancurel · May 26, 2023, 2:28pm

Hi everyone,

We have a cluster of 3 Zeebe brokers v8.0.2 with 2 Zeebe Gateways in front of them. The brokers are configured with 3 partitions and 3 replicas. we monitor the cluster using the Grafana dashboard provided by Camunda.
Every thing seems fine until one or two weeks of worflows running. At some point we can see that one of the three partition is having backpressure Dropping requests as follow:

partition 1 : 0%
partition 2: 8.32%
partition 3: 36.5%
See screenshot attached

MicrosoftTeams-image (2)1856×1005 131 KB

When we are such a situation, the workflows are slower than before. We want to understand why only one partition is saturated and what can we do solve it.

Any advises are welcome…

Thanks in advance,

Stephane

jwulf · May 29, 2023, 1:54am

How are you starting the instances?

rancurel · May 30, 2023, 9:04am

Hi jwulf,

The instances are started on reception of a message. Some instances will always be scheduled to run on a timely bases every two hours and will never be as Completed. Please just have a look to the figure below for a clearer understanding.

Thanks in advance,

Stephane

jwulf · June 13, 2023, 7:20am

Not sure if this has changed, but previously, the message start event subscription for a process model is opened on a specific partition. So if you are just starting that one process instance via messages, they will all start on the same partition.

manjeetarneja · December 22, 2023, 9:34am

Hey @rancurel, facing similar issue here. were you able to resolve it? Here is the link to the issue i’ve posted [https://forum.camunda.io/t/zeebe-backpressure/49529[issue]

Also, @jwulf will you be able to help me out here?

Thanks in advance!

system · January 31, 2024, 10:52am