How do I reduce broker latency to get faster end-to-end cycle time for a process instance?

Rafael Pili: Hello, I implemented a very simple workflow with just 2 steps. However, I noticed that there is a significant delay between steps creation (system not under stress). I noticed that after I request an instance creation, the first job takes 100ms to start execution, and the second job takes more or less the same amount of time. So, we are losing 200ms just on latency between Broker and Worker. Is this normal? Is there a way to improve this?

Josh Wulf: Yeah, it’s a thing. I haven’t checked this in a couple of years - see here: https://github.com/jwulf/zeebe-performance-tests/tree/0.26

I’ll update this to the latest version and run a quick test to see what numbers it produces now.

Josh Wulf: I updated it for 8.x and tested it.

On a local dockerised 8.0.2 broker with no exporter, I can get it down to 20-30ms broker latency per flow node.

On 0.26, that was 50ms-60ms.

So you can get it down.

Things to check are:

The performance of your workers against a local dockerised broker with no exporter. This will isolate the worker tuning, and give you a theoretical maximum.

Try running the Node perf test code against the same broker to see what the theoretical max is on your hardware. This will help you to identify if it is the worker tuning.

Josh Wulf: Things that will make it slower:

• Network roundtrip.
• Exporter backpressure (try with exporter loaded/not loaded - ES exporter adds network roundtrip and hardware for the ES cluster as well).
• Hardware constraints (CPU, RAM, cache).

Note: This post was generated by Slack Archivist from a conversation in the Camunda Platform 8 Slack, a source of valuable discussions on Camunda 8 (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.

Rafael Pili: thanks <@UT1BZ1GAG>, are there plans to update the official image to have this exporter option? :slightly_smiling_face:

Josh Wulf: How are you starting the broker?

Rafael Pili: I’m using the helm template, in a k8s environment

Josh Wulf: Is there a way to improve this? Run it on faster hardware with more RAM.

Josh Wulf: Closer to the workers

Josh Wulf: Ultimately - on your local machine.

Rafael Pili: nice, usually, does having more partitions impact performance?

Josh Wulf: The way to check what impact the partition count has on performance is to make a spreadsheet and run the exact same test with various partition counts, then look at the results.

Josh Wulf: Again, your hardware resources will impact this. Also, replication count. More replicas, more overhead. Zero replicas (like the perf test I posted), fastest performance.

Josh Wulf: And no redundancy. So it depends on what is important to your use case how you tune it.

Use parallel processing. Zeebe really good starts many parallel activities w/o latency. Or 1 ms latency. Sequence latency was always not strong side of zeebe/camunda 8 platform.