Zeebe broker works fine, but at some point it starts to fill logs with warns:
io.camunda.zeebe.gateway: …Failed to activate jobs…
Client isn’t provides much of a workload though. After that warns in broker log changes to: io.camunda.zeebe.broker.transport: …Failed to write command…
While all this happen, there are a lot of disk pressure for broker storage mounts and almost none of network traffic. Cpu also goes up to limits.
Logs
2023-01-25 12:37:34.397 [ActivateJobsHandler] [Broker-0-zb-actors-0] WARN
io.camunda.zeebe.gateway - Failed to activate jobs for type employee-module-save-interview-results-task-z from partition 1
java.util.concurrent.TimeoutException: Request ProtocolRequest{id=1025445, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=217, hash=-1017308190}} to 0.0.0.0:26501 timed out in PT15S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:230) ~[zeebe-atomix-cluster-8.1.5.jar:8.1.5]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.82.Final.jar:4.1.82.Final]
at java.lang.Thread.run(Unknown Source) ~[?:?]
2023-01-25 12:38:23.686 [Broker-0-SnapshotDirector-1] [Broker-0-zb-actors-0] INFO
io.camunda.zeebe.logstreams.snapshot - Finished taking temporary snapshot, need to wait until last written event position 10727942 is committed, current commit position is 10081276. After that snapshot will be committed.
2023-01-25 12:39:44.173 [Broker-0-InterPartitionCommandReceiverActor-1] [Broker-0-zb-actors-0] WARN
io.camunda.zeebe.broker.transport - Failed to write command MESSAGE_SUBSCRIPTION DELETE from 0 to logstream
2023-01-25 12:39:44.174 [Broker-0-InterPartitionCommandReceiverActor-1] [Broker-0-zb-actors-0] WARN
io.camunda.zeebe.broker.transport - Failed to write command MESSAGE_SUBSCRIPTION DELETE from 0 to logstream
2023-01-25 12:39:44.174 [Broker-0-InterPartitionCommandReceiverActor-1] [Broker-0-zb-actors-0] WARN
io.camunda.zeebe.broker.transport - Failed to write command MESSAGE_SUBSCRIPTION DELETE from 0 to logstream
2023-01-25 12:39:44.174 [Broker-0-InterPartitionCommandReceiverActor-1] [Broker-0-zb-actors-0] WARN
This is a second time for a week (after the first time we dropped all broker data) and we out of ideas how to solve this.
Docker image camunda/zeebe:8.1.5
Zeebe config
zeebe:
broker:
stepTimeout: 5m
gateway:
enable: true
network:
host: 0.0.0.0
port: 26500
minKeepAliveInterval: 30s
cluster:
requestTimeout: 15s
threads:
managementThreads: 1
monitoring:
enabled: true
security:
enabled: false
network:
host: 0.0.0.0
advertisedHost: 0.0.0.0
portOffset: 0
maxMessageSize: 4MB
commandApi:
host: 0.0.0.0
port: 26501
monitoringApi:
host: 0.0.0.0
port: 9600
data:
directories: [ data ]
logSegmentSize: 512MB
snapshotPeriod: 15m
backpressure:
algorithm: "fixed"
fixed:
limit: 1000
exporters:
elasticsearch:
className: io.camunda.zeebe.exporter.ElasticsearchExporter
args:
url: http://elasticsearch-svc:9200
bulk:
delay: 5
size: 1000
index:
prefix: zeebe-record
createTemplate: true
command: false
event: true
rejection: false
deployment: true
incident: true
job: true
message: false
messageSubscription: false
raft: false
workflowInstance: true
workflowInstanceSubscription: false