Hi,
I am experiencing weird behaviour on my test cluster. It used to work just fine, but now my jobworkers miss job activations most of the time. It may take minutes before a job is actually picked up. Also, it occurs that an older process that is waiting for a JobWorker is still waiting, whereas a newer process already completed the Job Worker. Jobs are sometimes executed immediately; and sometimes it takes minutes or even hours. In the TRACE log I see there are no pending activations; in operate I see that job is still waiting to be handled.
When jobs are waiting in operate, this is the logging of my application that should pick up those jobs:
2023-11-20T23:56:25.834+01:00 TRACE 6960 --- [ault-executor-0] io.camunda.zeebe.client.job.poller : No jobs activated for worker sampleWorker#handleRequest and job type mySampleWorker.v1
2023-11-20T23:56:25.936+01:00 TRACE 6960 --- [pool-2-thread-1] io.camunda.zeebe.client.job.poller : Polling at max 32 jobs for worker sampleWorker#handleRequest and job type mySampleWorker.v1
So somehow the engine does not recognize them. However, they may be picked up after a couple of minutes or even hours.
In a similar post I saw the question to display the topology. This is the result of that:
2023-11-20T22:52:17.186+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : Display topology ...
2023-11-20T22:52:17.188+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND HEADERS: streamId=113 headers=GrpcHttp2OutboundHeaders[:authority: 24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io:443, :path: /gateway_protocol.Gateway/Topology, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: zeebe-client-java/8.3.0 grpc-java-netty/1.58.0, grpc-accept-encoding: gzip, authorization: Bearer ey....., grpc-timeout: 9998615u] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2023-11-20T22:52:17.188+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND DATA: streamId=113 padding=0 endStream=true length=5 bytes=0000000000
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND HEADERS: streamId=113 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Mon, 20 Nov 2023 21:52:17 GMT, content-type: application/grpc, grpc-encoding: identity, grpc-accept-encoding: gzip, strict-transport-security: max-age=63072000; includeSubDomains] padding=0 endStream=false
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND DATA: streamId=113 padding=0 endStream=false length=382 bytes=00000001790a78080212597a656562652d322e7a656562652d62726f6b65722d736572766963652e32346535666332312d656435372d343739322d393036332d...
2023-11-20T22:52:17.469+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] OUTBOUND PING: ack=false bytes=1234
2023-11-20T22:52:17.470+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : zeebe-2.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00 DEBUG 5181 --- [worker-ELG-2-15] io.grpc.netty.NettyClientHandler : [id: 0x83fdf429, L:/192.168.68.107:61539 - R:24e5fc21-ed57-4792-9063-028b1c60d1f3.syd-1.zeebe.camunda.io/34.98.126.217:443] INBOUND HEADERS: streamId=113 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 2 - FOLLOWER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 3 - LEADER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 1 - FOLLOWER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : zeebe-1.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 2 - LEADER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 3 - FOLLOWER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 1 - LEADER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : zeebe-0.zeebe-broker-service.24e5fc21-ed57-4792-9063-028b1c60d1f3-zeebe.svc.cluster.local:26501
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 2 - FOLLOWER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 3 - FOLLOWER ( HEALTHY)
2023-11-20T22:52:17.472+01:00 INFO 5181 --- [nio-8080-exec-1] n.r.l.r.h.a.c8.TopologyDebugEndpoint : 1 - FOLLOWER ( HEALTHY)
That output looks good to me. And it used to work before my testcluster was deleted and I recreated it. The cluster is running version 8.3.3 and my spring boot camunda starter version is 8.3.1. So both the newest stable versions.
To my knowledge nothing else changed.
As a side note. I am also trying to use this setup from a cloud environment. There I am getting messages on the same connection like:
2023-11-20T22:07:17.028Z WARN 1 --- [ault-executor-5] io.camunda.zeebe.client.job.poller : Failed to activate jobs for worker sampleWorker#handleRequest and job type mySampleWorker.v1
io.grpc.StatusRuntimeException: UNKNOWN
at io.grpc.Status.asRuntimeException(Status.java:537)
.....
Caused by: java.lang.UnsupportedOperationException: null
at io.grpc.netty.AbstractHttp2Headers.isEmpty(AbstractHttp2Headers.java:40)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:419)
...
Not sure though whether or not that is related. It seems to be another issue. I have also see 502 response status when connecting from that environment.
And to make it really strange. In that environment the job workers cannot be activated but I can start processes using the autowired zeebeClient and the topology can also be displayed in the same application.
Hopefully somebody can help me out here. Something changed in the behaviour and I cannot get my head around it yet.
Thanks.
Patrick