I am performing load test with simple BPMN. BPMN have two task one is source and another one is SINK task. I am using the Zeebe kafka connector open source repository for async communication model. Whenever zeebe client started throwing error “RESOURCE_EXHUSTED” then SINK connector CPU utilisation reached to 100%. I dig into the code and found that recursive call to retry until it receive the successful code or failure code. Load test service is creates instances with the rate of 7K RPM. I also observe that backpressure limit went down to 5.
Please help me to fine tune the performance. We are expecting 100k RPM load in production.
Well, this is the nature of a distributed system. If the broker gets saturated, then it replies with gRPC Error 8 to signal backpressure to the clients.
How the client responds, and how it communicates backpressure through the system are design considerations.
If you change the Kafka sink to do backoff retry, it may allow the broker to recover, rather than increasing the pressure.
But if the Kafka input does not slow down (if this is sustained load, rather than a burst), you will have a buffering issue in the Kafka sink, which can cause it to run out of memory - in which case data may be lost.
So solutions could be to write backoff-retry in the Kafka sink (buffering when broker is signalling backpressure) to deal with bursts, or to provision the broker cluster to deal with the highest level of load.