May I ask why a timeout of 6 seconds has been configured, why the job has not given up resources and is still blocking, and other requests cannot come in?

package com.demo.adapt;

import java.time.Duration;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import io.camunda.zeebe.client.ZeebeClient;
import io.camunda.zeebe.client.api.response.ActivatedJob;
import io.camunda.zeebe.client.api.worker.JobClient;
import io.camunda.zeebe.client.api.worker.JobHandler;
import io.camunda.zeebe.client.api.worker.JobWorker;
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Component
public class TestWorker implements JobHandler {


  @Autowired private ZeebeClient client;

  private JobWorker worker;

  @PostConstruct
  public void register() {
	  worker = client
      .newWorker()
      .jobType("test001")
      .handler(this)
      .name("test001")
      .maxJobsActive(1).timeout(Duration.ofSeconds(6)).open();
    log.info("Job worker test001 opened and receiving jobs");
  }

  @Override
  public void handle(JobClient client, ActivatedJob job) throws Exception {
	  System.err.println("currentTime:" + System.currentTimeMillis());
	  Thread.sleep(50000);
	  client.newCompleteCommand(job.getKey()).send();
  }

  @PreDestroy
  public void unregister() {
    if (!worker.isClosed()) {
      worker.close();
      log.info("test001 Job worker closed");
    }
  }
}

May I ask why a timeout of 6 seconds has been configured, why the job has not given up resources and is still blocking, and other jobs cannot come in?

Hello @bugbugmaker ,

the reason is that by default, the zeebe client runs with a single-threaded threadpool.

This can be configured but in general, it should give you the chance to write unblocking functions using async in java.

I hope this helps

Jonathan

Can you give me more information? I don’t know how to achieve it

Also, I would like to know the effect of the timeout field

Hello @bugbugmaker ,

you could use an executorService, submit an execution and use the returned future.

For more information, you read here: https://www.baeldung.com/java-asynchronous-programming

The timeout is related to the job in zeebe. When the timeout hits in, the job will be available again for execution.

Jonathan

Hello, I understand the meaning of asynchronous invocation. What I want to clarify is the actual function of worker timeout. I thought it was when the timeout was reached and this job was not completed, the worker resources would be released to execute other jobs

Hello @bugbugmaker ,

in this case, that is your answer:

However, your suggestion could make sense, but would have to be implemented on your side.

Jonathan

Okay, thank you. May I ask what caused it? This causes zeebe to consume a high amount of CPU.

"http-nio-0.0.0.0-9600-Acceptor" #41 daemon prio=5 os_prio=0 cpu=0.42ms elapsed=52292.65s tid=0x00007fe7a535b9a0 nid=0x4f runnable  [0x00007fe7151f4000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.Net.accept(java.base@17.0.4.1/Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.implAccept(java.base@17.0.4.1/Unknown Source)
	at sun.nio.ch.ServerSocketChannelImpl.accept(java.base@17.0.4.1/Unknown Source)
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:546)
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:79)
	at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:129)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

"Broker-0-zb-actors-0" #42 prio=5 os_prio=0 cpu=43198018.36ms elapsed=52292.44s tid=0x00007fe7a531ed10 nid=0x50 runnable  [0x00007fe7150f2000]
   java.lang.Thread.State: RUNNABLE
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender.appendBlock(LogStorageAppender.java:121)
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender.onWriteBufferAvailable(LogStorageAppender.java:196)
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender$$Lambda$1556/0x00000008014253e8.run(Unknown Source)
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:74)
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:42)
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:125)
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-actors-1" #43 prio=5 os_prio=0 cpu=36769234.13ms elapsed=52292.44s tid=0x00007fe7a531f880 nid=0x51 waiting on condition  [0x00007fe714ff2000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	- parking to wait for  <0x00000007364b8400> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture.join(java.base@17.0.4.1/Unknown Source)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService.delegateDeletion(LogDeletionService.java:67)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService.lambda$onNewSnapshot$2(LogDeletionService.java:59)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService$$Lambda$2091/0x00000008016b5758.run(Unknown Source)
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:72)
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:42)
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:125)
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-fs-workers-0" #44 prio=5 os_prio=0 cpu=700565.30ms elapsed=52292.44s tid=0x00007fe7a5321de0 nid=0x52 runnable  [0x00007fe714ef1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.4.1/Unknown Source)
	at org.agrona.concurrent.BackoffIdleStrategy.idle(BackoffIdleStrategy.java:214)
	at io.camunda.zeebe.util.sched.ActorThread$ActorTaskRunnerIdleStrategy.onIdle(ActorThread.java:267)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:85)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-fs-workers-1" #45 prio=5 os_prio=0 cpu=700968.45ms elapsed=52292.44s tid=0x00007fe7a53229b0 nid=0x53 runnable  [0x00007fe714df0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.4.1/Unknown Source)
	at org.agrona.concurrent.BackoffIdleStrategy.idle(BackoffIdleStrategy.java:214)
	at io.camunda.zeebe.util.sched.ActorThread$ActorTaskRunnerIdleStrategy.onIdle(ActorThread.java:267)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:85)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"atomix-cluster-0" #46 prio=5 os_prio=0 cpu=35.80ms elapsed=52290.69s tid=0x00007fe6980ad740 nid=0x59 waiting on condition  [0x00007fe7756d8000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	- parking to wait for  <0x00000007013a20f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.4.1/Unknown Source)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

"netty-messaging-event-epoll-server-0" #47 prio=5 os_prio=0 cpu=8.88ms elapsed=52290.19s tid=0x00007fe698509750 nid=0x5a runnable  [0x00007fe7758da000]
   java.lang.Thread.State: RUNNABLE
	at io.netty.channel.epoll.Native.epollWait(Native Method)
	at io.netty.channel.epoll.Native.epollWait(Native.java:209)
	at io.netty.channel.epoll.Native.epollWait(Native.java:202)
	at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:294)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:351)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

Hello @bugbugmaker ,

I am not sure where the provided logs come from?

Jonathan

zeebe server

Hello @bugbugmaker ,

and what caused this high load?

Jonathan

I couldn’t find the specific cause, so I sent out the stack information of Zeebe. Can you help me see what caused it?

Hello @bugbugmaker ,

could it be related to your processes? Do you have a huge multi-instance or kick off many process instances at once (basically things that put an intermediate extremely high load on the engine)?

Jonathan

Attempting to disconnect all zeeeb clients will still result in high CPU. There is no way to lower it, only by deleting the data data in the zeebe directory. Can you tell what caused it through the stack information?

Hello @bugbugmaker ,

I am afraid I cannot tell you what caused it. Could you please share the process model that was executed?

Jonathan

It is very difficult to locate which process is causing it. I have many processes that may experience high CPU usage after running for a period of time. Moreover, restarting Zeebe and disconnecting the client cannot solve the problem. The Zeebe service stack information is as described above

Hello @bugbugmaker ,

in the end, the process models are like the code that is running:

It will probably cause issues. Some of the main causes could be live-locks (infinite loops), multi-instances or also recursion.

If your processes have any of these patterns in place, they would potentially be the cause. Then, we could help you to optimize them. What do you think?

Jonathan

The high CPU is caused by Zeebe running for a period of time, and it is suspected that some data is causing this issue. It is necessary to delete the query file in the data directory to restore normal operation. What I want to ask is how to know what is causing the problem, and how to avoid it. If this problem occurs, how to restore normal without deleting the data directory data

Hello @bugbugmaker ,

this would be easier with BPMN processes that are running. Usually, this does not happen.

Also, it would be helpful to know how much diskspace you gave zeebe.

Jonathan

There is still a lot of disk space left