Hi team,
I’m using zeebe self manager for testing. When suddenly restart zeebe worker and zeebe gateway, zeebe broker cannot export data to exporter like opensearch, kafka. Here is log of zeebe gateway and healthy of broker:
2024-08-21 07:15:33.688 [Gateway-camunda-8-zeebe-gateway-7846459446-gcdwd] [zb-actors-0] [ClientStreamServiceImpl] WARN
io.camunda.zeebe.transport.stream.impl.ClientStreamRequestManager - Failed to add stream 6d89c569-86d0-4e53-bd04-362af74db767 on 0; will retry in PT1S
java.util.concurrent.CompletionException: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host 10.233.90.48:26502 is not a known cluster member
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$25(NettyMessagingService.java:626) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-33.1.0-jre.jar:?]
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$26(NettyMessagingService.java:624) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:48) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:29) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at io.atomix.cluster.messaging.impl.NettyMessagingService$MessageDispatcher.channelRead0(NettyMessagingService.java:1109) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.110.Final.jar:4.1.110.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[netty-transport-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[netty-transport-classes-epoll-4.1.110.Final.jar:4.1.110.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]
at java.base/java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host 10.233.90.48:26502 is not a known cluster member
... 22 more
2024-08-21 07:15:39.793 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
2024-08-21 07:15:40.284 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
2024-08-21 07:15:40.421 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim - camunda-8-zeebe-gateway-7846459446-gcdwd - Member unreachable Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}
2024-08-21 07:15:42.611 [] [atomix-cluster-heartbeat-sender] [] WARN
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502 timed out in PT0.1S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-08-21 07:15:42.813 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed all probes of Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}. Marking as suspect.
2024-08-21 07:15:45.804 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
2024-08-21 07:15:46.304 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
2024-08-21 07:15:46.727 [] [atomix-cluster-heartbeat-sender] [] WARN
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed to probe 0
java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502 timed out in PT0.1S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.5.5.jar:8.5.5]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.110.Final.jar:4.1.110.Final]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-08-21 07:15:46.931 [] [atomix-cluster-heartbeat-sender] [] INFO
io.atomix.cluster.protocol.swim.probe - camunda-8-zeebe-gateway-7846459446-gcdwd - Failed all probes of Member{id=0, address=camunda-8-zeebe-0.camunda-8-zeebe.sb-bpm-framework.svc:26502, properties={brokerInfo=EADJAAAABAAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk8AAAAY2FtdW5kYS04LXplZWJlLTAuY2FtdW5kYS04LXplZWJlLnNiLWJwbS1mcmFtZXdvcmsuc3ZjOjI2NTAxBQADAQAAAAECAAAAAQMAAAABDAAABQAAADguNS41BQADAQAAAAECAAAAAQMAAAAB}}. Marking as suspect.
- GRPC Topology API:
{
“brokers”: [
{
“partitions”: [
{
“partitionId”: 2,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 1,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 3,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
}
],
“nodeId”: 2,
“host”: “camunda-8-zeebe-2.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
},
{
“partitions”: [
{
“partitionId”: 2,
“role”: “LEADER”,
“health”: “UNHEALTHY”
},
{
“partitionId”: 1,
“role”: “LEADER”,
“health”: “UNHEALTHY”
},
{
“partitionId”: 3,
“role”: “LEADER”,
“health”: “UNHEALTHY”
}
],
“nodeId”: 0,
“host”: “camunda-8-zeebe-0.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
},
{
“partitions”: [
{
“partitionId”: 2,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 1,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
},
{
“partitionId”: 3,
“role”: “FOLLOWER”,
“health”: “HEALTHY”
}
],
“nodeId”: 1,
“host”: “camunda-8-zeebe-1.camunda-8-zeebe.svc”,
“port”: 26501,
“version”: “8.5.5”
}
],
“clusterSize”: 3,
“partitionsCount”: 3,
“replicationFactor”: 3,
“gatewayVersion”: “8.5.5”
}
Any one can resove this issuse ? Thanks.