Hello, we have deployed the latest zeebe docker image to AWS Fargate and it keeps restarting endlessly due to the following error, which happens after 2 minutes of restarting the container:
2020-11-04 06:07:48.463 [Broker-0-HealthCheckService] [Broker-0-zb-actors-1] ERROR io.zeebe.broker.system - Partition-1 failed, marking it as unhealthy
After that it shuts down the broker, this repeats infinitely as Fargate tries to restart it after failure. The setup is, so far, just one standalone broker with gateway enabled and default configuration.
I haven’t used Fargate, but here are some speculations:
Insufficient resources? Disk? Memory?
The partition log directory not mounted correctly?
It will be some kind of environmental condition in Fargate. You can verify this by deploying the exact same config locally to see if the problem presents outside the Fargate env.
thanks @jwulf. I have deployed locally and everything works, so it is definitely a problem with fargate. I can’t seem to be able to determine what exactly is causing problems. How can I determine if the log directory partition is mounted correctly?
I have made some “progress”, I attached EFS storage and mounten the data dir to it. Now the error changed:
2020-11-11T04:27:53.753+01:00 2020-11-11 03:27:53.740 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] ERROR io.zeebe.logstreams - Actor Broker-0-StreamProcessor-1 failed in phase STARTED.
2020-11-11T04:27:53.753+01:00 io.zeebe.db.ZeebeDbException: Unexpected error occurred during RocksDB transaction.
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.runInTransaction(DefaultDbContext.java:142) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.ensureInOpenTransaction(ZeebeTransactionDb.java:173) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.whileTrue(ZeebeTransactionDb.java:304) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.TransactionalColumnFamily.whileTrue(TransactionalColumnFamily.java:93) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.TransactionalColumnFamily.whileTrue(TransactionalColumnFamily.java:147) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.TransactionalColumnFamily.whileTrue(TransactionalColumnFamily.java:84) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.engine.state.message.MessageState.visitMessagesWithDeadlineBefore(MessageState.java:271) ~[zeebe-workflow-engine-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.engine.processing.message.MessageTimeToLiveChecker.run(MessageTimeToLiveChecker.java:32) ~[zeebe-workflow-engine-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) ~[zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.util.sched.ActorThread.run(ActorThread.java:204) [zeebe-util-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 Caused by: org.rocksdb.RocksDBException: IOError(StaleFile)
2020-11-11T04:27:53.753+01:00 at org.rocksdb.Transaction.commit(Native Method) ~[rocksdbjni-6.13.3.jar:?]
2020-11-11T04:27:53.753+01:00 at org.rocksdb.Transaction.commit(Transaction.java:206) ~[rocksdbjni-6.13.3.jar:?]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.commitInternal(ZeebeTransaction.java:117) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.runInNewTransaction(DefaultDbContext.java:164) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 at io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.runInTransaction(DefaultDbContext.java:135) ~[zeebe-db-0.25.1.jar:0.25.1]
2020-11-11T04:27:53.753+01:00 ... 13 more
Could you please help me with the steps which you followed to fix this issue. i am struck in the same problem but no clue what to do. if you help me with steps which you followed that would be great help.
Hi, I could not solve the problem and gave up trying to run a cluster on Fargate. Instead, I set up a Kubernetes cluster in EKS and used the available helm charts as s starting point to further customize my setup. This has been working perfectly so far.