Zeebe backpressure with many JOB_BATCH.ACTIVATE events

Hey all, we are facing a weird issue with Zeebe cluster.
During some time of the day, we publish around 6-7L messages to Zeebe which corelate across different process instance to move them ahead.
This would generate a lot of Jobs which our Workers poll (16 Workers).
This runs smoothly for a couple of hours, but when near to end, we start getting backpressure upto 80%.

When I did a deep dive, I see we have a lot of JOB_BATCH.ACTIVATE command in metrics.

At this point Zeebe becomes un-responsive.



Mostly Resource Exhausted Error.

Partitions: 16
Brokers: 8 (2-2 even partition distribution)
Broker Spec: 4CPU 16GB Ram.

PVC and Memory Usage is pretty normal to me.

Suddenly Job Activation is reduced too

1 Like