ERROR io.zeebe.util.retry.EndlessRetryStrategy - invalid offset: -17568

maartend · February 12, 2021, 12:06pm

Hi,

It looks like the broker is facing serious issues exporting towards Hazelcast, but we cant pinpoint the excact issue and root cause.

What we see is the broker is up:

./zbctl --address 0.0.0.0:26500 --insecure status
Cluster size: 1
Partitions count: 1
Replication factor: 1
Gateway version: 0.25.3
Brokers:
  Broker 0 - 10.0.0.111:26501
    Version: 0.25.3
    Partition 1 : Leader

We also see we can start a new workflow instance:

./zbctl --address 0.0.0.0:26500 --insecure create instance demoProcess
{
  "workflowKey": 2251799813685877,
  "bpmnProcessId": "demoProcess",
  "version": 1,
  "workflowInstanceKey": 2251799818453600
}

The broker is throwing a continious errors:

2021-02-12 11:24:47.521 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException with message invalid offset: -17568, will retry...
java.lang.IllegalArgumentException: invalid offset: -17568
        at org.agrona.concurrent.UnsafeBuffer.boundsCheckWrap(UnsafeBuffer.java:1702) ~[agrona-1.8.0.jar:1.8.0]
        at org.agrona.concurrent.UnsafeBuffer.wrap(UnsafeBuffer.java:256) ~[agrona-1.8.0.jar:1.8.0]
        at io.zeebe.msgpack.spec.MsgPackReader.wrap(MsgPackReader.java:49) ~[zeebe-msgpack-core-0.25.3.jar:0.25.3]
        at io.zeebe.msgpack.UnpackedObject.wrap(UnpackedObject.java:29) ~[zeebe-msgpack-value-0.25.3.jar:0.25.3]
        at io.zeebe.logstreams.impl.log.LoggedEventImpl.readValue(LoggedEventImpl.java:135) ~[zeebe-logstreams-0.25.3.jar:0.25.3]
        at io.zeebe.engine.processing.streamprocessor.RecordValues.readRecordValue(RecordValues.java:35) ~[zeebe-workflow-engine-0.25.3.jar:0.25.3]
        at io.zeebe.broker.exporter.stream.ExporterDirector$RecordExporter.wrap(ExporterDirector.java:328) ~[zeebe-broker-0.25.3.jar:0.25.3]
        at io.zeebe.broker.exporter.stream.ExporterDirector.lambda$exportEvent$6(ExporterDirector.java:253) ~[zeebe-broker-0.25.3.jar:0.25.3]
        at io.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.retry.EndlessRetryStrategy.run(EndlessRetryStrategy.java:50) ~[zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.25.3.jar:0.25.3]

Hazelcast from logs is not reporting any issues:

2021-02-12 12:04:45,474 [ INFO] [hz.happy_fermi.HealthMonitor] [c.h.i.d.HealthMonitor]: [hazelcast1]:5701 [dev] [4.1] processors=4, physical.memory.total=15.6G, physical.memory.free=346.5M, swap.space.total=0, swap.space.free=0, heap.memory.used=60.9M, heap.memory.free=242.1M, heap.memory.total=303.0M, heap.memory.max=11.1G, heap.memory.used/total=20.10%, heap.memory.used/max=0.54%, minor.gc.count=5, minor.gc.time=67ms, major.gc.count=2, major.gc.time=110ms, load.process=0.00%, load.system=0.00%, load.systemAverage=4.37, thread.count=52, thread.peakCount=56, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=125, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=1, clientEndpoint.count=2, connection.active.count=2, client.connection.count=0, connection.count=0

How to move forward from here?
Deleting the whoe raft partition data will probably fix the issue for now, but it will cause issues for active workflow instances.

Broker version: 0.25.3

Any suggestions how to troubleshoot further?

maartend · February 12, 2021, 1:23pm

Below logging from a full start, with an freshly started hazelcast:

zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | ++ hostname -i
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + export ZEEBE_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + ZEEBE_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + ‘[’ false = true ‘]’
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + export ZEEBE_BROKER_NETWORK_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + ZEEBE_BROKER_NETWORK_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + export ZEEBE_BROKER_GATEWAY_CLUSTER_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + ZEEBE_BROKER_GATEWAY_CLUSTER_HOST=10.0.47.29
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | + exec /usr/local/zeebe/bin/broker
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |   ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |  |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |    / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |   / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |  /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.351 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.372 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler [“http-nio-10.0.47.29-9600”]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.373 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.374 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.39]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.541 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:49.541 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 1557
 ms
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:50.046 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService ‘applicationTaskExecutor’
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:50.373 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 4 endpoint(s) beneath base path ‘/actuator’
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:50.396 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler [“http-nio-10.0.47.29-9600"]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:50.426 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ‘’
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:18:54.400 [] [raft-server-0-raft-partition-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.raft.roles.FollowerRole$$Lambda$956/0x00000008
0077bc40@61e63c6 not registered
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:01.261 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.i
mpl.feel.MessagePackValueMapper@2f51e8ce)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@5f27977, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@5d87c892, configuration: Configuration(false)]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:03.231 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.i
mpl.feel.MessagePackValueMapper@154adc22)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@65a5d68e, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@8fce01c, configuration: Configuration(false)]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:03.247 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.i
mpl.feel.MessagePackValueMapper@73e2038b)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@69eea1ed, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@2b36fe7f, configuration: Configuration(false)]
New
2:20
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:03.231 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.i
mpl.feel.MessagePackValueMapper@154adc22)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@65a5d68e, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@8fce01c, configuration: Configuration(false)]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:03.247 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.i
mpl.feel.MessagePackValueMapper@73e2038b)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@69eea1ed, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@2b36fe7f, configuration: Configuration(false)]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.322 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.client.impl.spi.ClientInvocationService - hz.client_1 [dev] [4.0.3] Running with 2 respon
se threads, dynamic=true
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.379 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.core.LifecycleService - hz.client_1 [dev] [4.0.3] HazelcastClient 4.0.3 (20200921 - 59ae8
31) is STARTING
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.381 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.core.LifecycleService - hz.client_1 [dev] [4.0.3] HazelcastClient 4.0.3 (20200921 - 59ae8
31) is STARTED
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.409 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Trying to conn
ect to cluster: dev
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.413 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Trying to conn
ect to [hazelcast1]:5701
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.442 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.core.LifecycleService - hz.client_1 [dev] [4.0.3] HazelcastClient 4.0.3 (20200921 - 59ae8
31) is CLIENT_CONNECTED
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.443 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [dev] [4.0.3] Authenticated
with server [hazelcast1]:5701:bd48b86e-d696-404b-82de-2e7a131b7b0a, server version: 4.1, local address: /10.0.47.29:46601
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.453 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.Diagnostics - hz.client_1 [dev] [4.0.3] Diagnostics started
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.454 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.BuildInfoPlugin - hz.client_1 [dev] [4.0.3] Plugin:active
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.455 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.ConfigPropertiesPlugin - hz.client_1 [dev] [4.0.3] Plugin:active
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.455 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.SystemPropertiesPlugin - hz.client_1 [dev] [4.0.3] Plugin:active
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.457 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.MetricsPlugin - hz.client_1 [dev] [4.0.3] Plugin:active, period-mill
is:60000
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.459 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.internal.diagnostics.SystemLogPlugin - hz.client_1 [dev] [4.0.3] Plugin:active: logPartit
ions:false
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.459 [] [hz.client_1.event-2] INFO  com.hazelcast.client.impl.spi.ClientClusterService - hz.client_1 [dev] [4.0.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | Members [1] {
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     Member [hazelcast1]:5701 - bd48b86e-d696-404b-82de-2e7a131b7b0a
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | }
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.493 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] INFO  com.hazelcast.client.impl.statistics.ClientStatisticsService - Client statistics is enabled with period
 5 seconds.
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.513 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException w
ith message invalid offset: -17568, will retry...
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | java.lang.IllegalArgumentException: invalid offset: -17568
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at org.agrona.concurrent.UnsafeBuffer.boundsCheckWrap(UnsafeBuffer.java:1702) ~[agrona-1.8.0.jar:1.8.0]
2:21
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at org.agrona.concurrent.UnsafeBuffer.wrap(UnsafeBuffer.java:256) ~[agrona-1.8.0.jar:1.8.0]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.msgpack.spec.MsgPackReader.wrap(MsgPackReader.java:49) ~[zeebe-msgpack-core-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.msgpack.UnpackedObject.wrap(UnpackedObject.java:29) ~[zeebe-msgpack-value-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.logstreams.impl.log.LoggedEventImpl.readValue(LoggedEventImpl.java:135) ~[zeebe-logstreams-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.engine.processing.streamprocessor.RecordValues.readRecordValue(RecordValues.java:35) ~[zeebe-workflow-engine-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.broker.exporter.stream.ExporterDirector$RecordExporter.wrap(ExporterDirector.java:328) ~[zeebe-broker-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.broker.exporter.stream.ExporterDirector.lambda$exportEvent$6(ExporterDirector.java:253) ~[zeebe-broker-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.retry.EndlessRetryStrategy.run(EndlessRetryStrategy.java:50) ~[zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.25.3.jar:0.25.3]
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | 2021-02-12 13:19:05.523 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException w
ith message invalid offset: -17568, will retry...
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    | java.lang.IllegalArgumentException: invalid offset: -17568
zeebecluster_broker1.1.mbw4ch76gbnj@zeebe001    |     at org.agrona.concurrent.UnsafeBuffer.boundsCheckWrap(UnsafeBuffer.java:1702) ~[agrona-1.8.0.jar:1.8.0]

maartend · February 12, 2021, 1:30pm

Even when external hazelcast is not started, the broker will throw the illegal offset error.

So my best guess is there is data in the broker (raft) with an illegal offset? Could this assumption be right?
And what to do about it?

jwulf · February 15, 2021, 6:03pm

I would open an issue in the Hazelcast exporter on GitHub.

Philipp_Ossler · February 16, 2021, 4:23am

It seems not related to the Hazelcast exporter.

@maartend can you share your log (i.e. the data folder)? Otherwise, I’m not sure how to reproduce the error.

maartend · February 16, 2021, 11:46am

@philipp.ossler the issue is found.
We see a large BPMN was loaded which was not converted (used on 0.22.1 broker).
Why this did not resulted in the broker not accepting the BPMN it is strange, because we have seen that as well.

End of story makes this very dangerous situation, both the newer modelers support non-converted BPMN’s, the broker accepts it, but from that moment onwards it seems to be stalled in some actions.

We also tried to load a newer BPMN which is converted with the same name, but that did not help, for now only way out was deleting persisent broker data and start empty.

maartend · February 17, 2021, 9:12pm

@philipp.ossler, @jwulf, With the following link you can download the log, raft-partition-partition-1-1.log.tar.gz

What I see is loading the BPMN with zbctl deploy will nicely produce an error message: Error: rpc error: code = InvalidArgument desc = Command 'CREATE' rejected with code 'INVALID_ARGUMENT': Expected to deploy new resources, but encountered the following errors:

But despite the error message the broker seems to have accepted it anyway and ends up in a deadly loop.
Same happens when using Simple Monitor for loading the BPMN.

Let me know if more information is needed.

Philipp_Ossler · February 19, 2021, 6:18am

Using your data, we can see the same error.

Did you migrate the data from a previous version?
Can you reproduce the error with a fresh setup?

maartend · February 19, 2021, 3:42pm

@philipp.ossler Yes you can reproduce, just use a docker-compose with no persistentce and you will see the same issue after a docker-compose rm when loading the BPMN.

No data was migrated from previous versions.

Philipp_Ossler · February 22, 2021, 7:02am

Okay. I’m a bit puzzled I tried to reproduce the issue by uploading an extracted BPMN from the provided log to Zeebe 0.25.3 / 0.26.0. But it worked to far.

Please provide the required steps to reproduce the issue, including the BPMN and the docker-compose that was used.

maartend · February 23, 2021, 3:36pm

Hi @philipp.ossler puzzled that should not be the case.
Let me provide you a step-by-step guide to th issue.

Get the docker compose files:
git clone https://github.com/zeebe-io/zeebe-docker-compose.git

cd into /var/zeebe-docker-compose/simple-monitor and edit docker-compose.yml to change the broker version to 0.25.3 and simple-monitor to 0.19.1
The docker-compose.yml you can find here

The docker-compose does not have any data persistency, so it will be definitely gone after a docker-compose stop & rm

Start the the docker-compose:
/var/zeebe-docker-compose/simple-monitor# docker-compose up

When it is ready load the bmpn file, which you can find here

node0:/var/zeebe-docker-compose/bin# ./zbctl version
zbctl 0.26.0 (commit: 7e50628e)
node0:/var/zeebe-docker-compose/bin# ./zbctl --address 0.0.0.0:26500 --insecure deploy /var/zeebe-docker-compose/bpmn/solaris_firmware_update.bpmn

Then you will see the broker start throwing the errors.

Hope this helps

Philipp_Ossler · February 24, 2021, 2:14pm

Thank you for sharing!

I can reproduce the issue now. Just deploy the workflow to the broker in version 0.25.3. The same happens on version 0.26.1.

I will create an issue.

Philipp_Ossler · February 25, 2021, 5:55am

Here is the issue to track the progress: java.lang.IllegalArgumentException: invalid offset: -17568 · Issue #6442 · zeebe-io/zeebe · GitHub

maartend · March 2, 2021, 5:38pm

@philipp.ossler many thanks!