Hi ,

I am performing load test with simple BPMN. BPMN have two task one is source and another one is SINK task. I am using the Zeebe kafka connector open source repository for async communication model. Whenever zeebe client started throwing error “RESOURCE_EXHUSTED” then SINK connector CPU utilisation reached to 100%. I dig into the code and found that recursive call to retry until it receive the successful code or failure code. Load test service is creates instances with the rate of 7K RPM. I also observe that backpressure limit went down to 5.
Please help me to fine tune the performance. We are expecting 100k RPM load in production.

Well, this is the nature of a distributed system. If the broker gets saturated, then it replies with gRPC Error 8 to signal backpressure to the clients.

How the client responds, and how it communicates backpressure through the system are design considerations.

If you change the Kafka sink to do backoff retry, it may allow the broker to recover, rather than increasing the pressure.

But if the Kafka input does not slow down (if this is sustained load, rather than a burst), you will have a buffering issue in the Kafka sink, which can cause it to run out of memory - in which case data may be lost.

So solutions could be to write backoff-retry in the Kafka sink (buffering when broker is signalling backpressure) to deal with bursts, or to provision the broker cluster to deal with the highest level of load.

Looks like RESOURCE_EXHAUSTED should back-off the retry here:

I opened a feature request in the Kafka connector [0]. Someone who is familiar with Java could implement this and PR it in.

[0] Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED · Issue #70 · camunda-community-hub/kafka-connect-zeebe · GitHub

Thanks for looking at this Josh and creating the Issue :+1:

1 Like

Looking at it now - see linked GitHub issue for details

Falko and Bernd implemented this: Add backoff to retrying by berndruecker · Pull Request #71 · camunda-community-hub/kafka-connect-zeebe · GitHub

thanks for prioritising this. I will take pull in my local repository and will do load test. I will update the results here.