Zeebe broker doesn't work

Hello! we have an issue ( zeebe brokers stop working, and not send any data to elasticsearch )

and i have this info in zeebe broker logs

{"severity":"ERROR","logging.googleapis.com/sourceLocation":{"function":"lambda$writeEvent$7","file":"ProcessingStateMachine.java","line":357},"message":"Expected to write one or more follow up events for event 'LoggedEvent [type=0, version=0, streamId=2, position=886981310, key=4503600069508649, timestamp=1665189248831, sourceEventPosition=-1] RecordMetadata{recordType=COMMAND, intentValue=255, intent=COMPLETE, requestStreamId=2, requestId=298804784, protocolVersion=3, valueType=JOB, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=1.2.9}' without errors, but exception was thrown.","serviceContext":{"service":"zeebe","version":"development"},"context":{"threadId":26,"partitionId":"2","threadPriority":5,"loggerName":"io.camunda.zeebe.processor","threadName":"Broker-0-zb-actors-2","actor-name":"Broker-0-StreamProcessor-2"},"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","exception":"java.lang.IllegalArgumentException: Expected to claim segment of size 789136, but can't claim more than 786432 bytes.\n\tat io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:194) ~[zeebe-dispatcher-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.dispatcher.Dispatcher.claimFragmentBatch(Dispatcher.java:164) ~[zeebe-dispatcher-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.claimBatchForEvents(LogStreamBatchWriterImpl.java:222) ~[zeebe-logstreams-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.tryWrite(LogStreamBatchWriterImpl.java:199) ~[zeebe-logstreams-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.engine.processing.streamprocessor.writers.TypedStreamWriterImpl.flush(TypedStreamWriterImpl.java:106) ~[zeebe-workflow-engine-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.engine.processing.bpmn.behavior.TypedStreamWriterProxy.flush(TypedStreamWriterProxy.java:59) ~[zeebe-workflow-engine-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.lambda$writeEvent$6(ProcessingStateMachine.java:342) ~[zeebe-workflow-engine-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:44) ~[zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:95) [zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-1.2.9.jar:1.2.9]\n\tat io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:192) [zeebe-util-1.2.9.jar:1.2.9]\n","timestampSeconds":1667560469,"timestampNanos":581868000}

here is zeebe status

zbctl status
Cluster size: 3
Partitions count: 3
Replication factor: 3
Gateway version: 1.2.9
Brokers:
  Broker 0 - 10.0.230.51:26501
    Version: 1.2.9
    Partition 1 : Leader, Healthy
    Partition 2 : Leader, Healthy
    Partition 3 : Leader, Healthy
  Broker 1 - 10.0.230.52:26501
    Version: 1.2.9
    Partition 1 : Follower, Healthy
    Partition 2 : Follower, Healthy
    Partition 3 : Follower, Healthy
  Broker 2 - 10.0.230.54:26501
    Version: 1.2.9
    Partition 1 : Follower, Healthy
    Partition 2 : Follower, Healthy
    Partition 3 : Follower, Healthy

Hi @elazig ,

the issue that you are facing regards expected to claim segment of size 789136, but can't claim more than 786432 bytes. This error will occur during the claim operation of the dispatcher which is responsible for sending and receiving messages between different threads. It can fail if the zeebe publisher limit or the buffer partition size is reached. Can you please elaborate on during what kind of operations this error happens?

It appears to happen on job completion. Which often happens when a variables payload is sent along that is exceeding a size limit, but still within the actual threshold. It is hard for us to determine how large this threshold should be, and sometimes things slip through.

Please reduce the size of the variables sent into Zeebe. It is recommended to store large variables in a separate data store and pass references around in the processes.

If no variables (or a small payload) was sent along with the job completion, then please share your bpmn process model, so we can analyze what may have been the cause of this.

Best, Nico
Zeebe Dev

1 Like