Zeebe broker cannot export data after restart

Hi team,
I’m using zeebe self manager for testing. When suddenly restart zeebe worker and zeebe gateway, zeebe broker cannot export data to exporter like opensearch, kafka. Here is log of zeebe gateway and healthy of broker:

  2024-08-21 07:15:33.688 [Gateway-camunda-8-zeebe-gateway-7846459446-gcdwd] [zb-actors-0] [ClientStreamServiceImpl] WARN 

       io.camunda.zeebe.transport.stream.impl.ClientStreamRequestManager - Failed to add stream 6d89c569-86d0-4e53-bd04-362af74db767 on 0; will retry in PT1S

 java.util.concurrent.CompletionException: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host 10.233.90.48:26502 is not a known cluster member

 	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]

 	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$25(NettyMessagingService.java:626) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-33.1.0-jre.jar:?]

 	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$26(NettyMessagingService.java:624) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]

 	at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:48) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:29) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at io.atomix.cluster.messaging.impl.NettyMessagingService$MessageDispatcher.channelRead0(NettyMessagingService.java:1109) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]

 	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]

 	at java.base/java.lang.Thread.run(Unknown Source) ~[?:?]

 Caused by: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host 10.233.90.48:26502 is not a known cluster member

 	... 22 more

 2024-08-21 07:15:39.793 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 2024-08-21 07:15:40.284 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 2024-08-21 07:15:40.421 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim - camunda-8-zeebe-gateway-7846459446-gcdwd - Member unreachable Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}

 2024-08-21 07:15:42.611 [] [atomix-cluster-heartbeat-sender] [] WARN 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502 timed out in PT0.1S

 	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]

 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]

 	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

 2024-08-21 07:15:42.813 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed all probes of Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}. Marking as suspect.

 2024-08-21 07:15:45.804 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 2024-08-21 07:15:46.304 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 2024-08-21 07:15:46.727 [] [atomix-cluster-heartbeat-sender] [] WARN 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0

 java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502 timed out in PT0.1S

 	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]

 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]

 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]

 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]

 	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

 2024-08-21 07:15:46.931 [] [atomix-cluster-heartbeat-sender] [] INFO 

       io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed all probes of Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}. Marking as suspect.
 

  • GRPC Topology API:

{
“brokers”: [
{
“partitions”: [
{
“partitionId”: 2,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 1,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 3,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
}
],
“nodeId”: 2,
“host”: “camunda-8-zeebe-2.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
},
{
“partitions”: [
{
“partitionId”: 2,
“role”: “LEADER”,
“health”: “UNHEALTHY”
},
{
“partitionId”: 1,
“role”: “LEADER”,
“health”: “UNHEALTHY”
},
{
“partitionId”: 3,
“role”: “LEADER”,
“health”: “UNHEALTHY”
}
],
“nodeId”: 0,
“host”: “camunda-8-zeebe-0.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
},
{
“partitions”: [
{
“partitionId”: 2,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 1,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 3,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
}
],
“nodeId”: 1,
“host”: “camunda-8-zeebe-1.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
}
],
“clusterSize”: 3,
“partitionsCount”: 3,
“replicationFactor”: 3,
“gatewayVersion”: “8.5.5”
}

Any one can resove this issuse ? Thanks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.