Zeebe Broker fails to activate job

MrBorges · October 29, 2021, 12:01am

Guys,
I have a Zeebe Broker, with only one Node and Partition.
It was working fine, but last week it starts to return a lot of error, disconnect the workers, and mark the partition as unhealthy.

I think that it is overloaded and have a lot of instances queued to be processe. Here is the error:

Oct 28 20:51:51 ip-10-0-3-30 broker[31105]: 2021-10-28 20:51:51.379 [io.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-1] WARN  io.zeebe.gateway - Failed to activate jobs for type setar-email-enviado from partition 1
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]: java.util.concurrent.TimeoutException: Request type command-api-1 timed out in 14999 milliseconds
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at io.atomix.cluster.messaging.impl.AbstractClientConnection$Callback.timeout(AbstractClientConnection.java:163) ~[atomix-cluster-0.25.3.jar:0.25.3]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
Oct 28 20:51:51 ip-10-0-3-30 broker[31105]:         at java.lang.Thread.run(Thread.java:829) ~[?:?]
Oct 28 20:52:03 ip-10-0-3-30 broker[31105]: 2021-10-28 20:52:03.277 [Broker-0-HealthCheckService] [Broker-0-zb-actors-1] INFO  io.zeebe.broker.system - Partition-1 recovered, marking it as healthy
Oct 28 20:53:04 ip-10-0-3-30 broker[31105]: 2021-10-28 20:53:04.683 [Broker-0-HealthCheckService] [Broker-0-zb-actors-1] ERROR io.zeebe.broker.system - Partition-1 failed, marking it as unhealthy

The Zeebe is the 0.25.3

Thanks

korthout · October 29, 2021, 8:30am

Hi @MrBorges,

Zeebe 0.25.3 is a rather old version. Currently, we’re at 1.2.2. Between these versions, we’ve made many improvements to the stability (under the hood these versions differ quite a bit). So it’s hard to troubleshoot this now. The logged error also does not provide much details about what wrong or why the partition is now unhealthy.

Would it be possible for you to update to the latest release (currently Release Zeebe 1.2.2 · camunda-cloud/zeebe · GitHub? Note, there are some breaking changes between 0.25.3 and 1.0 (most notably, we changed the term workflow to process), and the data of a pre-1.0 Zeebe is incompatible with 1.0+.

I hope this helps.

Best,

Nico

MrBorges · October 29, 2021, 9:19am

Hi @korthout ,
thanks to the answer.

I just want to put this server to work while I create another Zeebe Broker in other server with the new version, and adjust the clients that we are running.

I read the Update 0.26 to 1.0 | Camunda Cloud Docs update guide and it doesn’t explain the process to upgrade the broker, I have to download the new version, stop the older and start the new ??

The process that are running in the old version will not migrate to the new ?

Thanks again!

korthout · October 29, 2021, 9:56am

Yes, as I mentioned the data is not compatible. It is thus not possible to update your existing cluster. You’ll need to create a new cluster. The guide you referenced mentions this at the top:

Be aware that the major version update from 0.26 to 1.0 is not backwards compatible. Therefore, data cannot be migrated from 0.26 to 1.0 and client applications must be adjusted to the new API versions.