Timeout on zeebe deploy command

kaikreuzer · August 23, 2019, 3:32pm

Hi,

I’m a totally newbie, currently trying out your quickstart tutorial - unfortunately, I am stuck pretty early on in step 3 already.

I am using zeebe 0.20.0 (also already tried 0.21.0-alpha1 with the same result) and the broker is up and running:

$ ./bin/zbctl status                    
Cluster size: 1
Partitions count: 1
Replication factor: 1
Brokers:
  Broker 0 - 0.0.0.0:26501
    Partition 1 : Leader

But when I try to deploy a workflow, I run into a timeout:

$ ./bin/zbctl deploy order-process.bpmn 
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

The broker shows this in the log, so clearly the client reached it:

17:30:26.714 [io.zeebe.gateway.impl.broker.BrokerRequestManager] [0.0.0.0:26501-zb-actors-1] ERROR io.zeebe.gateway - Error handling gRPC request
io.grpc.StatusRuntimeException: INTERNAL: Unexpected error occurred during the request processing
	at io.grpc.Status.asRuntimeException(Status.java:524) ~[grpc-api-1.22.1.jar:1.22.1]
	at io.zeebe.gateway.EndpointManager.convertThrowable(EndpointManager.java:291) ~[zeebe-gateway-0.20.0.jar:0.20.0]
	at io.zeebe.gateway.EndpointManager.lambda$sendRequest$3(EndpointManager.java:269) ~[zeebe-gateway-0.20.0.jar:0.20.0]
	at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequest$3(BrokerRequestManager.java:130) ~[zeebe-gateway-0.20.0.jar:0.20.0]
	at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$5(BrokerRequestManager.java:161) ~[zeebe-gateway-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:32) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:80) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:43) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:169) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:134) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:118) [zeebe-util-0.20.0.jar:0.20.0]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:96) [zeebe-util-0.20.0.jar:0.20.0]
Caused by: io.zeebe.transport.RequestTimeoutException: Request timed out after PT15S
	at io.zeebe.transport.impl.sender.OutgoingRequest.timeout(OutgoingRequest.java:143) ~[zeebe-transport-0.20.0.jar:0.20.0]
	at io.zeebe.transport.impl.sender.Sender.onTimerExpiry(Sender.java:475) ~[zeebe-transport-0.20.0.jar:0.20.0]
	at org.agrona.DeadlineTimerWheel.poll(DeadlineTimerWheel.java:284) ~[agrona-1.0.3.jar:1.0.3]
	at io.zeebe.transport.impl.sender.Sender.processTimeouts(Sender.java:106) ~[zeebe-transport-0.20.0.jar:0.20.0]
	... 6 more

What am I doing wrong here? Any hint is highly appreciated!

Cheers,
Kai

jwulf · August 24, 2019, 4:41am

Hi @kaikreuzer, the client reaches the gateway, but the gateway cannot reach the broker.

It’s not possible for me to reproduce your problem with just the info you’ve given. Missing at the moment:

What operating system?
How did you start the broker?

Josh

kaikreuzer · August 24, 2019, 10:03pm

Hi Josh, I am running on macOS 10.14.6 and openJDK 11.0.2.
I started the broker simply with ./bin/broker.

Here’s the full startup log, if it helps (I cannot see any errors/warnings that would be relevant):

$ ./bin/broker                          
23:58:30.746 [] [main] INFO  io.zeebe.broker.system - Version: 0.20.0
23:58:30.752 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {
  "network": {
    "host": "0.0.0.0",
    "portOffset": 0,
    "commandApi": {
      "host": "0.0.0.0",
      "port": 26501,
      "sendBufferSize": "16M"
    },
    "internalApi": {
      "host": "0.0.0.0",
      "port": 26502,
      "sendBufferSize": "16M"
    },
    "monitoringApi": {
      "host": "0.0.0.0",
      "port": 9600,
      "sendBufferSize": "16M"
    }
  },
  "cluster": {
    "initialContactPoints": [],
    "partitionIds": [
      1
    ],
    "nodeId": 0,
    "partitionsCount": 1,
    "replicationFactor": 1,
    "clusterSize": 1,
    "clusterName": "zeebe-cluster"
  },
  "threads": {
    "cpuThreadCount": 2,
    "ioThreadCount": 2
  },
  "data": {
    "directories": [
      "/Users/kai/dev/zeebe/data"
    ],
    "logSegmentSize": "512M",
    "snapshotPeriod": "15m",
    "snapshotReplicationPeriod": "5m",
    "maxSnapshots": 3
  },
  "exporters": [
    {
      "id": "hazelcast",
      "className": "io.zeebe.hazelcast.exporter.HazelcastExporter",
      "args": {
        "enabledValueTypes": "JOB,WORKFLOW_INSTANCE,DEPLOYMENT,INCIDENT,TIMER,VARIABLE,MESSAGE,MESSAGE_SUBSCRIPTION,MESSAGE_START_EVENT_SUBSCRIPTION"
      }
    }
  ],
  "gateway": {
    "enable": true,
    "network": {
      "host": "0.0.0.0",
      "port": 26500
    },
    "cluster": {
      "contactPoint": "0.0.0.0:26502",
      "transportBuffer": "128M",
      "requestTimeout": "15s",
      "clusterName": "zeebe-cluster",
      "memberId": "gateway",
      "host": "0.0.0.0",
      "port": 26502
    },
    "threads": {
      "managementThreads": 1
    },
    "monitoring": {
      "enabled": false,
      "host": "0.0.0.0",
      "port": 9600
    }
  }
}
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.agrona.nio.TransportPoller (file:/Users/kai/dev/zeebe/lib/agrona-1.0.3.jar) to field sun.nio.ch.SelectorImpl.selectedKeys
WARNING: Please consider reporting this to the maintainers of org.agrona.nio.TransportPoller
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23:58:30.869 [service-controller] [0.0.0.0:26501-zb-actors-0] INFO  io.zeebe.transport - Bound commandApi.server to /0.0.0.0:26501
23:58:31.729 [] [zb-blocking-task-runner-1-0.0.0.0:26501] INFO  io.zeebe.gateway - Version: 0.20.0
23:58:31.731 [] [zb-blocking-task-runner-1-0.0.0.0:26501] INFO  io.zeebe.gateway - Starting gateway with configuration {
  "enable": true,
  "network": {
    "host": "0.0.0.0",
    "port": 26500
  },
  "cluster": {
    "contactPoint": "0.0.0.0:26502",
    "transportBuffer": "128M",
    "requestTimeout": "15s",
    "clusterName": "zeebe-cluster",
    "memberId": "gateway",
    "host": "0.0.0.0",
    "port": 26502
  },
  "threads": {
    "managementThreads": 1
  },
  "monitoring": {
    "enabled": false,
    "host": "0.0.0.0",
    "port": 9600
  }
}
23:58:31.829 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [0.0.0.0:26501-zb-actors-1] INFO  io.zeebe.transport.endpoint - Registering endpoint for node '0' with address '0.0.0.0:26501' on transport 'gateway-broker-client'
23:58:31.966 [] [raft-server-system-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.protocols.raft.roles.FollowerRole$$Lambda$422/0x0000000800475840@4a7f59d5 not registered
23:58:32.094 [] [raft-server-raft-atomix-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.protocols.raft.roles.FollowerRole$$Lambda$422/0x0000000800475840@545c3f8b not registered
23:58:33.022 [io.zeebe.broker.system.monitoring.BrokerHealthCheckService] [0.0.0.0:26501-zb-actors-0] INFO  io.zeebe.broker.system - Partition '1' is installed.
23:58:33.023 [io.zeebe.broker.system.monitoring.BrokerHealthCheckService] [0.0.0.0:26501-zb-actors-0] INFO  io.zeebe.broker.system - All partitions are installed. Broker is ready!
Aug. 24, 2019 11:58:33 NACHM. com.hazelcast.instance.HazelcastInstanceFactory
WARNING: Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
 --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
23:58:33.063 [stream-processor] [0.0.0.0:26501-zb-actors-1] INFO  io.zeebe.logstreams - Recovered state of partition 1 from snapshot at position -1
23:58:33.092 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.instance.AddressPicker - [LOCAL] [dev] [3.11] Prefer IPv4 stack is true, prefer IPv6 addresses is false
23:58:33.095 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.instance.AddressPicker - [LOCAL] [dev] [3.11] Picked [192.168.99.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
23:58:33.109 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.system - [192.168.99.1]:5701 [dev] [3.11] Hazelcast 3.11 (20181023 - 1500bbb) starting at [192.168.99.1]:5701
23:58:33.109 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.system - [192.168.99.1]:5701 [dev] [3.11] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
23:58:33.109 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.instance.Node - [192.168.99.1]:5701 [dev] [3.11] A non-empty group password is configured for the Hazelcast member. Starting with Hazelcast version 3.8.2, members with the same group name, but with different group passwords (that do not use authentication) form a cluster. The group password configuration will be removed completely in a future release.
23:58:33.245 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator - [192.168.99.1]:5701 [dev] [3.11] Backpressure is disabled
23:58:33.515 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.instance.Node - [192.168.99.1]:5701 [dev] [3.11] Creating MulticastJoiner
23:58:33.615 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl - [192.168.99.1]:5701 [dev] [3.11] Starting 16 partition threads and 9 generic threads (1 dedicated for priority tasks)
23:58:33.618 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.internal.diagnostics.Diagnostics - [192.168.99.1]:5701 [dev] [3.11] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
23:58:33.623 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.core.LifecycleService - [192.168.99.1]:5701 [dev] [3.11] [192.168.99.1]:5701 is STARTING
23:58:35.777 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.internal.cluster.ClusterService - [192.168.99.1]:5701 [dev] [3.11] 

Members {size:1, ver:1} [
	Member [192.168.99.1]:5701 - 2c137f73-9547-415c-b398-e9e5f6f1f8fb this
]

23:58:35.797 [exporter] [0.0.0.0:26501-zb-fs-workers-0] INFO  com.hazelcast.core.LifecycleService - [192.168.99.1]:5701 [dev] [3.11] [192.168.99.1]:5701 is STARTED

Zelldon · August 25, 2019, 7:41pm

Hey @kaikreuzer,

thanks for trying Zeebe and the provided logs.

I think the problem is that your gateway has the wrong contactpoint configured. It uses currently the internal API (26502) instead of the command API which is 26501.
Please change that configuration and try it again.

Greets,
Chris

kaikreuzer · August 26, 2019, 7:25am

Thanks @Zelldon, this made it work!
The remaining question is: Why was it set incorrectly? I didn’t do any changes to the default setup.
Should this be considered to be a bug in the defaults as well as in the cfg file?

Zelldon · August 26, 2019, 2:34pm

Hey @kaikreuzer,

sorry but it seems I was wrong.

I had a deeper look at the code and the gateway connects via the internal API (26502) to get knowledge about the current cluster. This is done via atomix. It uses the infornations which are distributed in the cluster to get the broker command API, which means it should be right that the contactpoint is 26502.

I’m currently wondering why my suggested change fixed your problem. What exactly do you changed and how?

We can see that this worked because of the following log statement:

23:58:31.829 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [0.0.0.0:26501-zb-actors-1] INFO  io.zeebe.transport.endpoint - Registering endpoint for node '0' with address '0.0.0.0:26501' on transport 'gateway-broker-client'

Sorry for the confusion.

Greets,
Chris

kaikreuzer · August 26, 2019, 4:35pm

All I did was changing line 46 in gateway.cfg.toml from

# contactPoint = "127.0.0.1:26502"

to

contactPoint = "127.0.0.1:26501"

Putting the line back in comments brings back the timeout.

Zelldon · August 27, 2019, 8:45am

Hey @kaikreuzer,

I tried it with the latest alpha version and was not able to reproduce your problem. Following steps I did:

cd /tmp/
mkdir test
cd test/
# get latest zeebe distro
wget https://github.com/zeebe-io/zeebe/releases/download/0.21.0-alpha1/zeebe-distribution-0.21.0-alpha1.tar.gz
# unpack zeebe distro
tar -xzvf zeebe-distribution-0.21.0-alpha1.tar.gz
# change directory to bin folder
cd zeebe-broker-0.21.0-alpha1/bin/
# start broker
./broker &
# deploy
./zbctl deploy ~/goPath/src/github.com/zeebe-io/zeebe/engine/src/test/resources/workflows/simple-workflow.yaml --insecure

This prints:

{
  "key": 2251799813685250,
  "workflows": [
    {
      "bpmnProcessId": "yaml-workflow",
      "version": 1,
      "workflowKey": 2251799813685249,
      "resourceName": "/home/zell/goPath/src/github.com/zeebe-io/zeebe/engine/src/test/resources/workflows/simple-workflow.yaml"
    }
  ]
}

Which steps are different then yours?

You wrote that you changed the gateway configuration.
This means you’re starting broker and gateway separately, is this correct? If so then you should disable the embedded gateway in the broker otherwise you will have an conflict there.

Greets
Chris

jwulf · August 28, 2019, 6:19am

I can emulate this by starting the broker with AdoptJDK8 on Mac (works with @Zelldon’s steps), and starting it with AdoptJDK11 on Mac (behaves like @kaikreuzer’s problem).

During startup with AdoptJDK11, the server logs this error:

java.lang.UnsatisfiedLinkError: /private/var/folders/n5/0nrf31cs0nbgkj75fd1r962r0000gn/T/librocksdbjni13208091460779373387.jnilib: dlopen(/private/var/folders/n5/0nrf31cs0nbgkj75fd1r962r0000gn/T/librocksdbjni13208091460779373387.jnilib, 1): no suitable image found.  Did find:
	/private/var/folders/n5/0nrf31cs0nbgkj75fd1r962r0000gn/T/librocksdbjni13208091460779373387.jnilib: code signature in (/private/var/folders/n5/0nrf31cs0nbgkj75fd1r962r0000gn/T/librocksdbjni13208091460779373387.jnilib) not valid for use in process using Library Validation: mapped file has no cdhash, completely unsigned? Code has to be at least ad-hoc signed.
	at java.lang.ClassLoader$NativeLibrary.load0(Native Method) ~[?:?]
	at java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2430) ~[?:?]
	at java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2487) ~[?:?]
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2684) ~[?:?]
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2617) ~[?:?]
	at java.lang.Runtime.load0(Runtime.java:767) ~[?:?]
	at java.lang.System.load(System.java:1831) ~[?:?]
	at org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(NativeLibraryLoader.java:78) ~[rocksdbjni-6.1.1.jar:?]
	at org.rocksdb.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:56) ~[rocksdbjni-6.1.1.jar:?]
	at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:63) ~[rocksdbjni-6.1.1.jar:?]
	at org.rocksdb.RocksDB.<clinit>(RocksDB.java:34) ~[rocksdbjni-6.1.1.jar:?]
	at io.zeebe.db.impl.rocksdb.ZeebeRocksDbFactory.<clinit>(ZeebeRocksDbFactory.java:28) ~[zeebe-db-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.engine.state.DefaultZeebeDbFactory.defaultFactory(DefaultZeebeDbFactory.java:33) ~[zeebe-workflow-engine-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.engine.state.DefaultZeebeDbFactory.<clinit>(DefaultZeebeDbFactory.java:20) ~[zeebe-workflow-engine-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.broker.clustering.base.partitions.Partition.createSnapshotController(Partition.java:168) ~[zeebe-broker-core-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.broker.clustering.base.partitions.Partition.start(Partition.java:77) ~[zeebe-broker-core-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.onDependenciesAvailable(ServiceController.java:260) ~[zeebe-service-container-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:213) ~[zeebe-service-container-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:207) ~[zeebe-service-container-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.servicecontainer.impl.ServiceController.onServiceEvent(ServiceController.java:105) ~[zeebe-service-container-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) ~[zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:128) [zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:108) [zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:92) [zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:196) [zeebe-util-0.21.0-alpha1.jar:0.21.0-alpha1] ```

Zelldon · August 28, 2019, 7:01am

Hey @jwulf,

thanks for checking this.

This is weird that again the AdoptJDK11 makes problems on Mac OS, we had already a problem with that https://github.com/zeebe-io/zeebe/issues/2961 .

@kaikreuzer is it ok for you to use an different OpenJDK vendor or different version until we know what the problem is?

Greets
Chris

kaikreuzer · August 28, 2019, 9:54am

Thanks , guys!
Although I am not seeing the UnsatisfiedLinkError, the issue seems indeed to be related to the JDK build.
I have now installed the Oracle JDK 11.0.2 and the problem is gone. It is a bit worrying though that the JDK build makes a difference, but that’s not your fault .

jwulf · August 28, 2019, 11:11am

For this reason: Docker. It uses more CPU/RAM on Mac cause you have to run a virtualized Ubuntu OS, but man it saves time on debugging.