How can I tune backpressure to avoid RESOURCE_EXHAUSTED?

Elli Kotoula: Hello guys! We have dockerised zeebe with image versions: camunda/zeebe:0.25.1 and camunda/operate:0.25.0 . When we create workers it seems that at the beginning they consume very quickly the messages and after a while they consume messages slower than they receive the requests and as a result many active instances remain in the queue. We get the error: ERROR: Grpc Stream Error: 8 RESOURCE_EXHAUSTED: Expected to activate jobs of type 'workerHandler', but no jobs available and at least one broker returned 'RESOURCE_EXHAUSTED'. Please try again later. . We have used long polling too, but the same error occurs as well. Do you have any idea? Thank you!

kristoffer.bakkejord: By default the Zeebe backoff rules are a bit strict.

See the documentation here: https://docs.camunda.io/docs/0.26/product-manuals/zeebe/deployment-guide/operations/backpressure/#backpressure-tuning

Elli Kotoula: Ok thank you! Is there any example on how we can use backpressure with docker? I would really appreciate it.

kristoffer.bakkejord: As mentioned in <https://camunda-cloud.slack.com/archives/C6WGNHV2A/p1621548224021000?thread_ts=1621515977.013400&cid=C6WGNHV2A|another thread> It’s recommended to set up Prometheus/Grafana to monitor the state of the zeebe cluster.

kristoffer.bakkejord: You can try changing the backpressure algorithm to one listed here

https://docs.camunda.io/docs/0.26/product-manuals/zeebe/deployment-guide/operations/backpressure/#backpressure-algorithms

https://github.com/camunda-cloud/zeebe/blob/76199f697242a77cf6c08fae3d86e4bfbb5083ed/dist/src/main/config/broker.yaml.template#L289

Elli Kotoula: Thank you, I will check the above and try it.

Note: This post was generated by Slack Archivist from a conversation in the Zeebe Slack, a source of valuable discussions on Zeebe (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.

zell: I would recommend to update to 1.0, which includes several improvements and fixes. One of it is performance, this means you will be able to have more stuff on the log, without triggering backpressure (or at least later). Because, backpressure kicks in if it detects a latency degradation on the processing side. Thiis happens for example if you have many user commands on the log.

Elli Kotoula: Hello thank you for the response! We updated to 1.0.0 and we get the following error: StorageException: Failed to acquire storage lock

zell: Sounds like you tried to upgraded with old data? please checkout the update guide https://docs.camunda.io/docs/guides/update-guide/026-to-100/ Tl;DR; you need to have a clean state to start with

Elli Kotoula: Ok thank you!!