Expected to handle gRPC request, but request timed out between gateway and broker

cyberpank · August 9, 2022, 9:12am

Sometimes we are facing with getting the timeout between Zeebe gateway and broker with no obvious reason for it.
The configuration of Zeebe is all-in-one, so the gateway and other Zeebe components are running on the same container in k8s cluster. The lion part of requests works fine, but sometimes ( ~ once a day ) we get the error:

[io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-1] DEBUG io.camunda.zeebe.gateway - Expected to handle gRPC request, but request timed out between gateway and broker java.util.concurrent.TimeoutException: Request ProtocolRequest{id=1896336, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=673, hash=-148275053}} to 0.0.0.0:26501 timed out in PT1M at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:218) ~[zeebe-atomix-cluster-1.2.4.jar:1.2.4] at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.68.Final.jar:4.1.68.Final] at java.lang.Thread.run(Unknown Source) ~[?:?]

There is no pressure on the CPU or other hardware resources.
The configuration of zeebe:
zeebe:
broker:
gateway:
enable: true

  network:            
    port: 26500

  security:            
    enabled: false

network:          
  host: 0.0.0.0

data:
  directories: [ data ]          
  logSegmentSize: 512MB          
  snapshotPeriod: 15m

cluster:          
  clusterSize: 1          
  replicationFactor: 1          
  partitionsCount: 1

threads:          
  cpuThreadCount: 2          
  ioThreadCount: 2

exporters:
  hazelcast:
    className: io.zeebe.hazelcast.exporter.HazelcastExporter
    jarPath: /tmp/zeebe-hazelcast-exporter-1.1.0-jar-with-dependencies.jar
  elasticsearch:
    className: io.camunda.zeebe.exporter.ElasticsearchExporter
    args:
      url: http://es-zeebe-dev.somedomain.local:9200
      bulk:
        delay: 5
        size: 1000
      index:
        prefix: qa-zeebe-record
        createTemplate: true
        command: true
        event: true
        rejection: true
        deployment: true
        error: true
        incident: true
        job: true
        jobBatch: true
        message: true
        messageSubscription: true
        variable: true
        variableDocument: true
        workflowInstance: true
        workflowInstanceCreation: true
        workflowInstanceSubscription: true

jwulf · August 9, 2022, 11:43pm

Hi @cyberpank

What version of Zeebe are you running?

Josh

cyberpank · August 10, 2022, 6:04am

HI @jwulf , zeebe version is 1.2.4

jwulf · August 10, 2022, 8:57am

This is quite an old version. Can you use a more updated version?

cyberpank · August 10, 2022, 9:04am

Actually, we’ve faced this issue on versions 0.24, 0.26, on 1.2.4 then ( it was the latest version last year). Ok, I’ll try the latest one

jwulf · August 16, 2022, 3:45am

Hi @cyberpank

Any luck with this?

Josh

cyberpank · August 16, 2022, 8:37am

Hi @jwulf
For a while, version 8.0.5 was installed and timeouts didn’t appear yet. Let’s check it for 2-3 days

Edward-Shaw · November 28, 2022, 2:30pm

I got the same error message with v8.0.5. And I can’t increase the timeout with ‘ZEEBE_GATEWAY_CLUSTER_REQUESTTIMEOUT’ environment variable. Any solution? I have two thousand of instance running with 8 core CPU and 32 GB memory.