Expected to handle gRPC request, but request timed out between gateway and broker

Sometimes we are facing with getting the timeout between Zeebe gateway and broker with no obvious reason for it.
The configuration of Zeebe is all-in-one, so the gateway and other Zeebe components are running on the same container in k8s cluster. The lion part of requests works fine, but sometimes ( ~ once a day ) we get the error:

[io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-1] DEBUG io.camunda.zeebe.gateway - Expected to handle gRPC request, but request timed out between gateway and broker java.util.concurrent.TimeoutException: Request ProtocolRequest{id=1896336, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=673, hash=-148275053}} to 0.0.0.0:26501 timed out in PT1M at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:218) ~[zeebe-atomix-cluster-1.2.4.jar:1.2.4] at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.68.Final.jar:4.1.68.Final] at java.lang.Thread.run(Unknown Source) ~[?:?]

There is no pressure on the CPU or other hardware resources.
The configuration of zeebe:
zeebe:
broker:
gateway:
enable: true

  network:            
    port: 26500

  security:            
    enabled: false

network:          
  host: 0.0.0.0

data:
  directories: [ data ]          
  logSegmentSize: 512MB          
  snapshotPeriod: 15m

cluster:          
  clusterSize: 1          
  replicationFactor: 1          
  partitionsCount: 1

threads:          
  cpuThreadCount: 2          
  ioThreadCount: 2

exporters:
  hazelcast:
    className: io.zeebe.hazelcast.exporter.HazelcastExporter
    jarPath: /tmp/zeebe-hazelcast-exporter-1.1.0-jar-with-dependencies.jar
  elasticsearch:
    className: io.camunda.zeebe.exporter.ElasticsearchExporter
    args:
      url: http://es-zeebe-dev.somedomain.local:9200
      bulk:
        delay: 5
        size: 1000
      index:
        prefix: qa-zeebe-record
        createTemplate: true
        command: true
        event: true
        rejection: true
        deployment: true
        error: true
        incident: true
        job: true
        jobBatch: true
        message: true
        messageSubscription: true
        variable: true
        variableDocument: true
        workflowInstance: true
        workflowInstanceCreation: true
        workflowInstanceSubscription: true

Hi @cyberpank

What version of Zeebe are you running?

Josh

HI @jwulf , zeebe version is 1.2.4

This is quite an old version. Can you use a more updated version?

Actually, we’ve faced this issue on versions 0.24, 0.26, on 1.2.4 then ( it was the latest version last year). Ok, I’ll try the latest one

1 Like

Hi @cyberpank

Any luck with this?

Josh

Hi @jwulf
For a while, version 8.0.5 was installed and timeouts didn’t appear yet. Let’s check it for 2-3 days

2 Likes

I got the same error message with v8.0.5. And I can’t increase the timeout with ‘ZEEBE_GATEWAY_CLUSTER_REQUESTTIMEOUT’ environment variable. Any solution? I have two thousand of instance running with 8 core CPU and 32 GB memory.