May I ask why a timeout of 6 seconds has been configured, why the job has not given up resources and is still blocking, and other requests cannot come in?

bugbugmaker · June 15, 2023, 11:21am

package com.demo.adapt;

import java.time.Duration;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import io.camunda.zeebe.client.ZeebeClient;
import io.camunda.zeebe.client.api.response.ActivatedJob;
import io.camunda.zeebe.client.api.worker.JobClient;
import io.camunda.zeebe.client.api.worker.JobHandler;
import io.camunda.zeebe.client.api.worker.JobWorker;
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Component
public class TestWorker implements JobHandler {


  @Autowired private ZeebeClient client;

  private JobWorker worker;

  @PostConstruct
  public void register() {
	  worker = client
      .newWorker()
      .jobType("test001")
      .handler(this)
      .name("test001")
      .maxJobsActive(1).timeout(Duration.ofSeconds(6)).open();
    log.info("Job worker test001 opened and receiving jobs");
  }

  @Override
  public void handle(JobClient client, ActivatedJob job) throws Exception {
	  System.err.println("currentTime:" + System.currentTimeMillis());
	  Thread.sleep(50000);
	  client.newCompleteCommand(job.getKey()).send();
  }

  @PreDestroy
  public void unregister() {
    if (!worker.isClosed()) {
      worker.close();
      log.info("test001 Job worker closed");
    }
  }
}

May I ask why a timeout of 6 seconds has been configured, why the job has not given up resources and is still blocking, and other jobs cannot come in?

jonathan.lukas · June 16, 2023, 5:23pm

Hello @bugbugmaker ,

the reason is that by default, the zeebe client runs with a single-threaded threadpool.

This can be configured but in general, it should give you the chance to write unblocking functions using async in java.

I hope this helps

Jonathan

bugbugmaker · June 17, 2023, 3:03am

Can you give me more information? I don’t know how to achieve it

bugbugmaker · June 17, 2023, 3:09am

Also, I would like to know the effect of the timeout field

jonathan.lukas · June 17, 2023, 9:35pm

Hello @bugbugmaker ,

you could use an executorService, submit an execution and use the returned future.

For more information, you read here: https://www.baeldung.com/java-asynchronous-programming

The timeout is related to the job in zeebe. When the timeout hits in, the job will be available again for execution.

Jonathan

bugbugmaker · June 19, 2023, 3:22am

Hello, I understand the meaning of asynchronous invocation. What I want to clarify is the actual function of worker timeout. I thought it was when the timeout was reached and this job was not completed, the worker resources would be released to execute other jobs

jonathan.lukas · June 19, 2023, 4:05am

Hello @bugbugmaker ,

in this case, that is your answer:

However, your suggestion could make sense, but would have to be implemented on your side.

Jonathan

bugbugmaker · June 19, 2023, 7:02am

Okay, thank you. May I ask what caused it? This causes zeebe to consume a high amount of CPU.

"http-nio-0.0.0.0-9600-Acceptor" #41 daemon prio=5 os_prio=0 cpu=0.42ms elapsed=52292.65s tid=0x00007fe7a535b9a0 nid=0x4f runnable  [0x00007fe7151f4000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.Net.accept(java.base@17.0.4.1/Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.implAccept(java.base@17.0.4.1/Unknown Source)
	at sun.nio.ch.ServerSocketChannelImpl.accept(java.base@17.0.4.1/Unknown Source)
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:546)
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:79)
	at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:129)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

"Broker-0-zb-actors-0" #42 prio=5 os_prio=0 cpu=43198018.36ms elapsed=52292.44s tid=0x00007fe7a531ed10 nid=0x50 runnable  [0x00007fe7150f2000]
   java.lang.Thread.State: RUNNABLE
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender.appendBlock(LogStorageAppender.java:121)
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender.onWriteBufferAvailable(LogStorageAppender.java:196)
	at io.camunda.zeebe.logstreams.impl.log.LogStorageAppender$$Lambda$1556/0x00000008014253e8.run(Unknown Source)
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:74)
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:42)
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:125)
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-actors-1" #43 prio=5 os_prio=0 cpu=36769234.13ms elapsed=52292.44s tid=0x00007fe7a531f880 nid=0x51 waiting on condition  [0x00007fe714ff2000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	- parking to wait for  <0x00000007364b8400> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.CompletableFuture.join(java.base@17.0.4.1/Unknown Source)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService.delegateDeletion(LogDeletionService.java:67)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService.lambda$onNewSnapshot$2(LogDeletionService.java:59)
	at io.camunda.zeebe.broker.logstreams.LogDeletionService$$Lambda$2091/0x00000008016b5758.run(Unknown Source)
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:72)
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:42)
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:125)
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-fs-workers-0" #44 prio=5 os_prio=0 cpu=700565.30ms elapsed=52292.44s tid=0x00007fe7a5321de0 nid=0x52 runnable  [0x00007fe714ef1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.4.1/Unknown Source)
	at org.agrona.concurrent.BackoffIdleStrategy.idle(BackoffIdleStrategy.java:214)
	at io.camunda.zeebe.util.sched.ActorThread$ActorTaskRunnerIdleStrategy.onIdle(ActorThread.java:267)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:85)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"Broker-0-zb-fs-workers-1" #45 prio=5 os_prio=0 cpu=700968.45ms elapsed=52292.44s tid=0x00007fe7a53229b0 nid=0x53 runnable  [0x00007fe714df0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.4.1/Unknown Source)
	at org.agrona.concurrent.BackoffIdleStrategy.idle(BackoffIdleStrategy.java:214)
	at io.camunda.zeebe.util.sched.ActorThread$ActorTaskRunnerIdleStrategy.onIdle(ActorThread.java:267)
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:85)
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189)

"atomix-cluster-0" #46 prio=5 os_prio=0 cpu=35.80ms elapsed=52290.69s tid=0x00007fe6980ad740 nid=0x59 waiting on condition  [0x00007fe7756d8000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
	- parking to wait for  <0x00000007013a20f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.4.1/Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.4.1/Unknown Source)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

"netty-messaging-event-epoll-server-0" #47 prio=5 os_prio=0 cpu=8.88ms elapsed=52290.19s tid=0x00007fe698509750 nid=0x5a runnable  [0x00007fe7758da000]
   java.lang.Thread.State: RUNNABLE
	at io.netty.channel.epoll.Native.epollWait(Native Method)
	at io.netty.channel.epoll.Native.epollWait(Native.java:209)
	at io.netty.channel.epoll.Native.epollWait(Native.java:202)
	at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:294)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:351)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(java.base@17.0.4.1/Unknown Source)

jonathan.lukas · June 26, 2023, 1:23pm

Hello @bugbugmaker ,

I am not sure where the provided logs come from?

Jonathan

bugbugmaker · June 27, 2023, 1:49am

zeebe server

jonathan.lukas · June 27, 2023, 8:25am

Hello @bugbugmaker ,

and what caused this high load?

Jonathan

bugbugmaker · June 27, 2023, 9:40am

I couldn’t find the specific cause, so I sent out the stack information of Zeebe. Can you help me see what caused it?

jonathan.lukas · June 27, 2023, 10:05am

Hello @bugbugmaker ,

could it be related to your processes? Do you have a huge multi-instance or kick off many process instances at once (basically things that put an intermediate extremely high load on the engine)?

Jonathan

bugbugmaker · June 27, 2023, 10:22am

Attempting to disconnect all zeeeb clients will still result in high CPU. There is no way to lower it, only by deleting the data data in the zeebe directory. Can you tell what caused it through the stack information?

jonathan.lukas · June 27, 2023, 11:15am

Hello @bugbugmaker ,

I am afraid I cannot tell you what caused it. Could you please share the process model that was executed?

Jonathan

bugbugmaker · June 28, 2023, 2:14am

It is very difficult to locate which process is causing it. I have many processes that may experience high CPU usage after running for a period of time. Moreover, restarting Zeebe and disconnecting the client cannot solve the problem. The Zeebe service stack information is as described above

jonathan.lukas · June 28, 2023, 6:12am

Hello @bugbugmaker ,

in the end, the process models are like the code that is running:

It will probably cause issues. Some of the main causes could be live-locks (infinite loops), multi-instances or also recursion.

If your processes have any of these patterns in place, they would potentially be the cause. Then, we could help you to optimize them. What do you think?

Jonathan

bugbugmaker · June 29, 2023, 2:17am

The high CPU is caused by Zeebe running for a period of time, and it is suspected that some data is causing this issue. It is necessary to delete the query file in the data directory to restore normal operation. What I want to ask is how to know what is causing the problem, and how to avoid it. If this problem occurs, how to restore normal without deleting the data directory data

jonathan.lukas · June 30, 2023, 4:37am

Hello @bugbugmaker ,

this would be easier with BPMN processes that are running. Usually, this does not happen.

Also, it would be helpful to know how much diskspace you gave zeebe.

Jonathan

bugbugmaker · June 30, 2023, 6:18am

There is still a lot of disk space left