Activating JobWorker partially working

Patrick · November 20, 2023, 10:18pm

Hi,
I am experiencing weird behaviour on my test cluster. It used to work just fine, but now my jobworkers miss job activations most of the time. It may take minutes before a job is actually picked up. Also, it occurs that an older process that is waiting for a JobWorker is still waiting, whereas a newer process already completed the Job Worker. Jobs are sometimes executed immediately; and sometimes it takes minutes or even hours. In the TRACE log I see there are no pending activations; in operate I see that job is still waiting to be handled.

When jobs are waiting in operate, this is the logging of my application that should pick up those jobs:

2023-11-20T23:56:25.834+01:00 TRACE 6960 --- [ault-executor-0] io.camunda.zeebe.client.job.poller       : No jobs activated for worker sampleWorker#handleRequest and job type mySampleWorker.v1
2023-11-20T23:56:25.936+01:00 TRACE 6960 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller       : Polling at max 32 jobs for worker sampleWorker#handleRequest and job type mySampleWorker.v1

So somehow the engine does not recognize them. However, they may be picked up after a couple of minutes or even hours.

In a similar post I saw the question to display the topology. This is the result of that:

2023-11-20T22:52:17.186+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     : Display topology ...
2023-11-20T22:52:17.188+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND HEADERS: streamId=113 headers=GrpcHttp2OutboundHeaders[:authority: 24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io:443, :path: /gateway_protocol.Gateway/Topology, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: zeebe-client-java/8.3.0 grpc-java-netty/1.58.0, grpc-accept-encoding: gzip, authorization: Bearer ey....., grpc-timeout: 9998615u] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2023-11-20T22:52:17.188+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND DATA: streamId=113 padding=0 endStream=true length=5 bytes=0000000000
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND HEADERS: streamId=113 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Mon, 20 Nov 2023 21:52:17 GMT, content-type: application/grpc, grpc-encoding: identity, grpc-accept-encoding: gzip, strict-transport-security: max-age=63072000; includeSubDomains] padding=0 endStream=false
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND DATA: streamId=113 padding=0 endStream=false length=382 bytes=00000001790a78080212597a656562652d322e7a656562652d62726f6b65722d736572766963652e32346535666332312d656435372d343739322d393036332d...
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND PING: ack=false bytes=1234
2023-11-20T22:52:17.470+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :   zeebe-2.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler         : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND HEADERS: streamId=113 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          2 - FOLLOWER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          3 - LEADER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          1 - FOLLOWER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :   zeebe-1.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          2 - LEADER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          3 - FOLLOWER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          1 - LEADER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :   zeebe-0.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          2 - FOLLOWER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          3 - FOLLOWER  ( HEALTHY) 
2023-11-20T22:52:17.472+01:00  INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint     :          1 - FOLLOWER  ( HEALTHY)

That output looks good to me. And it used to work before my testcluster was deleted and I recreated it. The cluster is running version 8.3.3 and my spring boot camunda starter version is 8.3.1. So both the newest stable versions.
To my knowledge nothing else changed.

As a side note. I am also trying to use this setup from a cloud environment. There I am getting messages on the same connection like:

2023-11-20T22:07:17.028Z  WARN 1 --- [ault-executor-5] io.camunda.zeebe.client.job.poller       : Failed to activate jobs for worker sampleWorker#handleRequest and job type mySampleWorker.v1

io.grpc.StatusRuntimeException: UNKNOWN
	at io.grpc.Status.asRuntimeException(Status.java:537)
.....
Caused by: java.lang.UnsupportedOperationException: null
	at io.grpc.netty.AbstractHttp2Headers.isEmpty(AbstractHttp2Headers.java:40)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:419)
...

Not sure though whether or not that is related. It seems to be another issue. I have also see 502 response status when connecting from that environment.
And to make it really strange. In that environment the job workers cannot be activated but I can start processes using the autowired zeebeClient and the topology can also be displayed in the same application.

Hopefully somebody can help me out here. Something changed in the behaviour and I cannot get my head around it yet.

Thanks.
Patrick

Patrick · November 21, 2023, 6:03am

I also create a minimal spring boot application (version 3.1.5) that just hosts the JobWorker. For an example process it took 6 minutes to be processed. In the output below you see the time stamp at which the process is ran and when it is actually handled.

... (repeats next two lines over and over) ...
2023-11-21T06:45:00.004+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller       : Polling at max 32 jobs for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:10.322+01:00 TRACE 8908 --- [ault-executor-0] io.camunda.zeebe.client.job.poller       : No jobs activated for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:10.423+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller       : Polling at max 32 jobs for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:20.743+01:00 TRACE 8908 --- [ault-executor-2] io.camunda.zeebe.client.job.poller       : No jobs activated for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:20.844+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller       : Polling at max 32 jobs for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:27.937+01:00 DEBUG 8908 --- [ault-executor-0] io.camunda.zeebe.client.job.poller       : Activated 1 jobs for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:27.961+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.worker       : Handle {"key":2251799813728892,"type":"mySampleWorker.v1","customHeaders":{"origin":"sample job worker"},"processInstanceKey":2251799813728871,"bpmnProcessId":"Process_1hpve5w","processDefinitionVersion":3,"processDefinitionKey":2251799813728831,"elementId":"Activity_1irnn0a","elementInstanceKey":2251799813728891,"tenantId":"<default>","worker":"minimalJobWorker#handleRequest","retries":3,"deadline":1700545827799,"variables":"{\"abc\":{\"a\":10,\"b\":false,\"c\":\"een tekstuele waarde\",\"d\":0.1629054282344201},\"startTime\":\"2023-11-21T05:39:57.794Z[GMT]\",\"inputParam\":42,\"inputRandom\":0.06243909821625715}","variablesAsMap":{"abc":{"a":10,"b":false,"c":"een tekstuele waarde","d":0.1629054282344201},"startTime":"2023-11-21T05:39:57.794Z[GMT]","inputParam":42,"inputRandom":0.06243909821625715}} and invoke worker ZeebeWorkerValue{type='mySampleWorker.v1', name='minimalJobWorker#handleRequest', timeout=-1, maxJobsActive=-1, requestTimeout=-1, pollInterval=-1, autoComplete=true, fetchVariables=[], enabled=true, methodInfo=io.camunda.zeebe.spring.client.bean.MethodInfo@12670e55}
2023-11-21T06:45:27.991+01:00  INFO 8908 --- [pool-2-thread-1] c.e.j.MinimalJobWorker                   : Handling C8 request SampleWorkerInput(inputParam=42, inputRandom=0.06243909821625715, abc=NestedInput(a=10, b=false, c=een tekstuele waarde, d=0.1629054282344201), startTime=2023-11-21T05:39:57.794Z[GMT]) with headers {origin=sample job worker, nog een header=mogen er echt spaties in staan?}.
2023-11-21T06:45:27.995+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.worker       : Auto completing {"key":2251799813728892,"type":"mySampleWorker.v1","customHeaders":{"origin":"sample job worker"},"processInstanceKey":2251799813728871,"bpmnProcessId":"Process_1hpve5w","processDefinitionVersion":3,"processDefinitionKey":2251799813728831,"elementId":"Activity_1irnn0a","elementInstanceKey":2251799813728891,"tenantId":"<default>","worker":"minimalJobWorker#handleRequest","retries":3,"deadline":1700545827799,"variables":"{\"abc\":{\"a\":10,\"b\":false,\"c\":\"een tekstuele waarde\",\"d\":0.1629054282344201},\"startTime\":\"2023-11-21T05:39:57.794Z[GMT]\",\"inputParam\":42,\"inputRandom\":0.06243909821625715}","variablesAsMap":{"abc":{"a":10,"b":false,"c":"een tekstuele waarde","d":0.1629054282344201},"startTime":"2023-11-21T05:39:57.794Z[GMT]","inputParam":42,"inputRandom":0.06243909821625715}}
2023-11-21T06:45:28.008+01:00 TRACE 8908 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller       : Polling at max 32 jobs for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1
2023-11-21T06:45:39.000+01:00 TRACE 8908 --- [ault-executor-5] io.camunda.zeebe.client.job.poller       : No jobs activated for worker minimalJobWorker#handleRequest and job type mySampleWorker.v1

The start time is set with the feel function now() and has value: 2023-11-21T05:39:57.794Z[GMT] The job is activated at "2023-11-21T06:45:27.991985". So about 6 minutes later and during those 6 minutes it is logging that there are no job activated for this JobWorker. At a certain moment in time it then decides there is a job after all. That often takes approximately 10 minutes.

Any clue what I am doing wrong?

nathan.loding · November 22, 2023, 5:47pm

Hi @Patrick - this is indeed strange. Can you share a bit more info?

how do you have your job worker configured? Can you share a code sample?
are you using the SaaS product, or did you deploy your own instance of Camunda to a cloud provider?
do you have any firewall/proxy between you and the cluster that could perhaps be interfering (eg, caching) requests? Have you tried connecting to the cluster from a different machine/network?

moj · July 2, 2024, 3:34pm

Hello @Patrick

Were you able to solve this issue?

Do you maybe have a link to your minimal spring app ?