Recently, we have moved our deployment from docker-compose to k8s cluster.
Since then, we observe a long delay (could be up to 30 seconds) until a worker starts to process a job.
we started off with zeebe 0.23.1 and used the charts available then.
Also, we might have edited the deployment files manually and through helm since then (we should align these).
Out current configuration is:
$ kubectl describe deploy zeebe-zeebe-gateway https://gist.github.com/yoavsal/7deb98b5180209a40d672cff32deee55
The first line of code is coming from the broker (from our custom exporter actually)
Then, after ~8 seconds the first log from zeebe-client debug is showing.
For our worker first line of code search for “REST2”
The last line of the log is again from our exporter when the workflow is completed.
BTW, the environment variable is ZEEBE_NODE_LOG_LEVEL (not ZEEBE_NODE_LOGLEVEL)
The other thing to do is to use a zbctl worker to isolate whether it is a broker issue, or an interaction with the client. See the Register a Worker section here: https://docs.cloud.camunda.io/docs/cli-zbctl
I don’s see any indication for starvation.
We actually increases the requested cpu and memory and have enough resources on the node.
Notice that the job processing seems to happen on some time cycle.
E.g. if I create several instances, it will take the delay until a first job is processed, but once it starts all the instances are handled immediately.
Could it be that long polling is not working as expected in gateway/cluster mode?
@jwulf, @salaboy
I have upgraded my helm charts from v0.0.110 to v0.0.128 and the delay seem to be gone .
I’m not sure what exactly fixed that at least now it is gone.
Thanks for the support
Quick update:
The delays with 0.23.4 were better than before but after a day of use it is getting worse again.
At first, I suspected the JavaOpts that have the below list of options removed in v128 comparing to v100, but I’m not sure…