How fix this cluster errors? My cluster work not correct

I have failed probe in my zeebe:8.4.0 cluster, What means?
2024-02-06 08:11:56.682 [Broker-0] [zb-actors-0] [SnapshotStore-2] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 394286-23-412472-412465
2024-02-06 08:18:00.686 [atomix-cluster-heartbeat-sender] WARN
io.atomix.cluster.protocol.swim.probe - 0 - Failed to probe 2
java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-zeebe-2.camunda-zeebe.baur-zeebe.svc:26502 timed out in PT0.1S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.104.Final.jar:4.1.104.Final]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-02-06 08:21:49.220 [Broker-0] [zb-actors-0] [SnapshotStore-3] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 395725-196-414552-414506
2024-02-06 08:23:48.884 [Broker-0] [zb-actors-1] [SnapshotStore-1] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 395853-168-414148-414110
2024-02-06 08:24:20.914 [atomix-cluster-heartbeat-sender] WARN
io.atomix.cluster.protocol.swim.probe - 0 - Failed to probe camunda-zeebe-gateway-58b8dcfc79-9m47h
java.util.concurrent.TimeoutException: Request atomix-membership-probe to 10.233.105.149:26502 timed out in PT0.1S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.104.Final.jar:4.1.104.Final]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-02-06 08:24:21.033 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim.probe - 0 - Failed all probes of Member{id=camunda-zeebe-gateway-58b8dcfc79-9m47h, address=10.233.105.149:26502, properties={event-service-topics-subscribed=KIIDAGpvYnNBdmFpbGFibOU=}}. Marking as suspect.
2024-02-06 08:24:21.034 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim - 0 - Member unreachable Member{id=camunda-zeebe-gateway-58b8dcfc79-9m47h, address=10.233.105.149:26502, properties={event-service-topics-subscribed=KIIDAGpvYnNBdmFpbGFibOU=}}
2024-02-06 08:24:21.461 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim.probe - 0 - Failed to probe camunda-zeebe-gateway-58b8dcfc79-9m47h
2024-02-06 08:24:25.120 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim - 0 - Member reachable Member{id=camunda-zeebe-gateway-58b8dcfc79-9m47h, address=10.233.105.149:26502, properties={event-service-topics-subscribed=KIIDAGpvYnNBdmFpbGFibOU=}}
2024-02-06 08:26:57.307 [Broker-0] [zb-actors-0] [SnapshotStore-2] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 397374-23-415592-415553
2024-02-06 08:30:34.964 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim.probe - 0 - Failed to probe camunda-zeebe-gateway-58b8dcfc79-9m47h
2024-02-06 08:30:35.298 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim - 0 - Member unreachable Member{id=camunda-zeebe-gateway-58b8dcfc79-9m47h, address=10.233.105.149:26502, properties={event-service-topics-subscribed=KIIDAGpvYnNBdmFpbGFibOU=}}
2024-02-06 08:30:36.991 [atomix-cluster-heartbeat-sender] INFO
io.atomix.cluster.protocol.swim - 0 - Member reachable Member{id=camunda-zeebe-gateway-58b8dcfc79-9m47h, address=10.233.105.149:26502, properties={event-service-topics-subscribed=KIIDAGpvYnNBdmFpbGFibOU=}}

Hi @baurzhan
These are the warnings and info messages.
It looks very similar to this issue Zeebe gateway and brokers spamming atomix-cluster-heartbeat-sender logs, failed to probe · Issue #14845 · camunda/zeebe · GitHub

Could you describe what works incorrectly in the cluster?

Regards,
Alex

Thanks for reply.
50% business processes disappear
When GET https://state-machine-service.kz/api/Tasks/byProcessId/2251799814012644, apear this error:
process instance is null

Hi @baurzhan

When GET https://state-machine-service.kz/api/Tasks/byProcessId/2251799814012644

Could you please share a link to the documentation describing this call?

You can get a list of tasks related to the process with the following call: Task API | Camunda 8 Docs use TaskSearchRequest for a search

For example:

POST

curl --location 'https://<hostname>/v1/tasks/search' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <token>
--header 'Content-Type: application/json' \
--data '{
    "processInstanceKey": "4503599654639256"
}'

Replace <instance_key> with the actual instance key.

50% business processes disappear

Could you please elaborate in more detail on what you mean by this?

Regards,
Alex

Thanks.


and
logs from zeebe pods:
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-02-07 09:53:37.542 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=721146, metadata=}}
2024-02-07 09:54:38.872 [Broker-0] [zb-fs-workers-0] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=721354, metadata=}}
2024-02-07 09:55:40.327 [Broker-0] [zb-fs-workers-0] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=721566, metadata=}}
2024-02-07 09:56:41.520 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=722040, metadata=}}
2024-02-07 09:57:42.342 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=723404, metadata=}}
2024-02-07 09:58:43.692 [Broker-0] [zb-fs-workers-0] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=724364, metadata=}}
2024-02-07 09:59:45.619 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=724580, metadata=}}
2024-02-07 10:00:47.463 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=724796, metadata=}}
2024-02-07 10:01:48.183 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=724997, metadata=}}
2024-02-07 10:02:49.162 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=725204, metadata=}}
2024-02-07 10:03:50.675 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=725412, metadata=}}
2024-02-07 10:04:08.121 [Broker-0] [zb-actors-2] [SnapshotStore-2] DEBUG
io.camunda.zeebe.logstreams.snapshot - Taking temporary snapshot into /usr/local/zeebe/data/raft-partition/partitions/2/snapshots/699948-55-726436-726438.
2024-02-07 10:04:08.860 [Broker-0] [zb-actors-2] [SnapshotStore-2] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 699948-55-726436-726438
2024-02-07 10:04:08.864 [Broker-0] [zb-actors-2] [SnapshotStore-2] DEBUG
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Deleting previous snapshot 696296-55-720357-720348
2024-02-07 10:04:08.889 [Broker-0] [raft-server-0-2] [raft-server-2] DEBUG
io.camunda.zeebe.journal.file.SegmentsManager - No segments can be deleted with index < 699848 (first log index: 657524)
2024-02-07 10:04:52.346 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=725628, metadata=}}
2024-02-07 10:05:53.626 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=725836, metadata=}}
2024-02-07 10:06:07.835 [Broker-0] [zb-actors-0] [SnapshotStore-1] DEBUG
io.camunda.zeebe.logstreams.snapshot - Taking temporary snapshot into /usr/local/zeebe/data/raft-partition/partitions/1/snapshots/699764-201-726807-726773.
2024-02-07 10:06:08.092 [Broker-0] [zb-actors-0] [SnapshotStore-1] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 699764-201-726807-726773
2024-02-07 10:06:08.096 [Broker-0] [zb-actors-0] [SnapshotStore-1] DEBUG
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Deleting previous snapshot 696132-201-720806-720785
2024-02-07 10:06:08.124 [Broker-0] [raft-server-0-1] [raft-server-1] DEBUG
io.camunda.zeebe.journal.file.SegmentsManager - No segments can be deleted with index < 699664 (first log index: 635086)
2024-02-07 10:06:54.444 [Broker-0] [zb-fs-workers-2] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=726052, metadata=}}
2024-02-07 10:07:16.369 [atomix-cluster-heartbeat-sender] WARN
io.atomix.cluster.protocol.swim.probe - 0 - Failed to probe 2
java.util.concurrent.TimeoutException: Request atomix-membership-probe to camunda-zeebe-2.camunda-zeebe.baur-zeebe.svc:26502 timed out in PT0.1S
at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:261) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.104.Final.jar:4.1.104.Final]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
2024-02-07 10:07:55.790 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=726270, metadata=}}
2024-02-07 10:08:07.829 [Broker-0] [zb-actors-1] [SnapshotStore-3] DEBUG
io.camunda.zeebe.logstreams.snapshot - Taking temporary snapshot into /usr/local/zeebe/data/raft-partition/partitions/3/snapshots/700157-228-726308-726270.
2024-02-07 10:08:08.105 [Broker-0] [zb-actors-1] [SnapshotStore-3] INFO
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Committed new snapshot 700157-228-726308-726270
2024-02-07 10:08:08.109 [Broker-0] [zb-actors-1] [SnapshotStore-3] DEBUG
io.camunda.zeebe.snapshots.impl.FileBasedSnapshotStore - Deleting previous snapshot 696670-228-721039-721042
2024-02-07 10:08:08.119 [Broker-0] [raft-server-0-3] [raft-server-3] DEBUG
io.camunda.zeebe.journal.file.SegmentsManager - No segments can be deleted with index < 700057 (first log index: 663809)
2024-02-07 10:08:56.701 [Broker-0] [zb-fs-workers-1] [Exporter-3] DEBUG
io.camunda.zeebe.broker.exporter - Current exporter state {hazelcast={position=726478, metadata=}}

  1. I have zeebe cluster
    clusterSize: 3
    partitionsCount: 3
    replicationFactor: 3
    gatewayVersion: “8.4.1”
  2. and haselcast cluster
    Members {size:3, ver:49} [
    Member [10.233.105.136]:5701 - ab99e13e-320d-4510-948a-60a31c29d804
    Member [10.233.83.14]:5701 - 742648fd-759f-4896-aeb2-1c9107fb16ad
    Member [10.233.125.12]:5701 - fad2ec9b-6953-4ceb-9627-723f04f449ea this
    ]
  3. Zeeqs with db external postgresql
  4. All pods
    camunda-zeebe-0 1/1 Running 0 14h
    camunda-zeebe-1 1/1 Running 0 14h
    camunda-zeebe-2 1/1 Running 0 14h
    camunda-zeebe-gateway-f85c98b4f-wpdq8 1/1 Running 0 16h
    hz-hazelcast-0 1/1 Running 0 16h
    hz-hazelcast-1 1/1 Running 0 16h
    hz-hazelcast-2 1/1 Running 0 16h
    zeeqs-postgres-6744ff467c-gtjqq 1/1 Running 0 14h

BP create correct but when I try get this BP with zeeqs, I sometimes cant get BP with first trying?

Hi @baurzhan - the call to https://state-machine-service.kz/api/Tasks/byProcessId/2251799814012644 is not a Camunda API, and the screenshot is not of a Camunda application. Please share more details about what software you are working with: what is it called, who provides it, is there a GitHub repository, etc.

Your other thread appears to be a duplicate of this one, so I am going to close the other thread and we can continue the conversation here. Thanks!