Low Performance?

christian.achenbach · April 26, 2019, 10:59am

Hello together,

while I tried some things with Zeebe (single broker and few worker on my notebook), I noticed that a broker with no active instances produces quite a high cpu load at around 5-20% (probably depending on my zeebe.cfg). When I start many parallel workflow instances, the load of the worker nodes is very low but the broker produces all the load. And I can see only around 200 workflows instances per second on my i7, which is less than I expected.

My settings:

partitionsCount = 4
cpuThreadCount = 2
ioThreadCount = 2
reportingInterval = "5s"

Can I “tune” Zeebe for more throughput or is clustering the only option? Are there any known typical bottlenecks I can prevent?

Here are some screenshots from Grafana (clean broker without load):

Even with a “no-operation workflow” I can get 1700 completed workflows (at max) per second:

Thanks!

Greetings
Christian

jwulf · April 27, 2019, 4:46pm

Hi Christian, do you have a GitHub repo with your workers and test workflow in it? I’d like to try this.

I ran a clustering test on AWS with four nodes on Zeebe 0.15, but haven’t tried it since.

christian.achenbach · April 28, 2019, 2:12pm

Hi Josh,

thanks for your reply!

I have the simplest imaginable setup:

The JobHandler is just completing the command:

    private static class NoOpHandler implements JobHandler {
        @Override
        public void handle(final JobClient client, final ActivatedJob job) {
            client.newCompleteCommand(job.getKey()).send();
        }
    }

What throughput did you achieve on AWS?

Greeting
Christian

P.S. You can find my complete worker and workflow here:

gist.github.com

https://gist.github.com/cachenbach/84f75a575882eddeea2f203ea0def6e0

noop.bpmn

<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc="http://www.omg.org/spec/DD/20100524/DC" xmlns:zeebe="http://camunda.org/schema/zeebe/1.0" xmlns:di="http://www.omg.org/spec/DD/20100524/DI" id="Definitions_11jisqm" targetNamespace="http://bpmn.io/schema/bpmn" exporter="Zeebe Modeler" exporterVersion="0.6.2">
  <bpmn:process id="Process_1rqmvpl" isExecutable="true">
    <bpmn:startEvent id="StartEvent_1">
      <bpmn:outgoing>SequenceFlow_0qdrx47</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:endEvent id="EndEvent_0l6mow5">
      <bpmn:incoming>SequenceFlow_0vs826q</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:serviceTask id="ServiceTask_0q8pbnj" name="NoOpTask">

This file has been truncated. show original

gist.github.com

https://gist.github.com/cachenbach/1ead4ef752a2c566bc1fbcede35e3927

Benchmark.java

import io.zeebe.client.ZeebeClient;
import io.zeebe.client.api.clients.JobClient;
import io.zeebe.client.api.response.ActivatedJob;
import io.zeebe.client.api.subscription.JobHandler;
import io.zeebe.client.api.subscription.JobWorker;

import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;

This file has been truncated. show original

Philipp_Ossler · April 30, 2019, 7:18am

Hi @christian.achenbach,

I can confirm your observation. 200 completed workflow instances per second on a single broker is similar to our performance tests with Zeebe 0.17.0 . You could tune your benchmark a bit (e.g. partition count, CPU threads, job polling, etc.) but you can’t increase the throughput significantly (e.g. of a factor of 10). However, you can build a cluster of brokers to balance the load. This is how Zeebe can scale

Regarding the const CPU load, this is caused by the job workers. We are aware of the problem and want to work on it in the feature.

Do you have any specific throughput you need to reach?
Can you use a cluster to reach your goals?

Best regards,
Philipp

jwulf · April 30, 2019, 8:34am

Regarding the const CPU load, this is caused by the job workers. We are aware of the problem and want to work on it in the feature.

@philipp.ossler is this from grpc polling?

christian.achenbach · April 30, 2019, 9:15am

Hi @philipp.ossler,

thanks for your clarification.

Do you have any specific throughput you need to reach?

No, not a very specific right now. I was hoping to reach the 32.000 workflows per second like you did in the Benchmark last year. To be fair: Even with 200 “transitions” per second Zeebe would be ~100x cheaper than AWS Step Functions. But 32.000/s would be very tempting.

Can you use a cluster to reach your goals?

Do you have any recommendations for a cluster? Many very small instances, like t3.small or a few larger, like t3.xlarge?

Thank you!
Greetings,
Christian

Philipp_Ossler · April 30, 2019, 12:26pm

In the benchmark, it measures only the created workflow instance. You can create more instance than completing it. To complete instances, you need to poll jobs, complete jobs and process until the end event. So, it is a lot more to do

I don’t have any experience. I would assume that it also works on a smaller machine. However, if you have more power then you can do more

Philipp_Ossler · April 30, 2019, 12:34pm

Yes, the load is related to the (gPRC) job polling. Currently, the worker poll jobs constantly, even if there are no jobs. This could be improved, for example, using long-polling, back off, or job subscriptions.

system · January 31, 2024, 10:10am