Zeebe 0.23.7 and 0.26.1 problems - Jobs intermittently pauses execution and associated worker is not invoked

marcoplaut · March 9, 2021, 8:53pm

Hello,

The symptoms described here look like an issue I’m having and which is already reported here :

github.com/camunda/zeebe

Worker request ActivateJobs activate job in Zeebe but get an empty response

opened 12:18AM - 27 Nov 20 UTC

closed 08:19AM - 30 Aug 21 UTC

vtexier

kind/bug scope/broker area/performance severity/mid scope/gateway

**Describe the bug** Ramdomly and not often, a worker requesting ActivateJobs o…n Zeebe get an empty answer, but the job is activated in Zeebe so it will never be done. **To Reproduce** Make an integration test that is a loop of this sequence: - Deploy workflow with one service task. - Create workflow instance. - Wait for the job to be completed by the worker with a timeout. - Repeat... Request parameters: ```python activate_jobs_response = stub.ActivateJobs( gateway_pb2.ActivateJobsRequest( # the job type, as defined in the BPMN process type=agent_job_type, # the name of the agent activating the jobs, mostly used for logging purposes worker=agent_job_type, # a job returned after this call will not be activated by another call until the # timeout( in ms) has been reached timeout=43200000, # 12h # the maximum jobs to activate by this request maxJobsToActivate=5, # a list of variables to fetch as the job variables; if empty, all visible variables at the time # of activation for the scope of the job will be returned # fetchVariable=[] # The request will be completed when at least one job is activated or after the # requestTimeout (in ms). if the requestTimeout = 0, a default timeout is used. if the # requestTimeout < 0, long polling is disabled and the request is completed immediately, # even when no job is activated. requestTimeout=5 ) ) ``` **Expected behavior** If the worker ActivateJobs request activate jobs in Zeebe, they must be returned to worker in response of the request. **Log/Stacktrace** You'll find in the attached file: * ES exported event (help see the time of events) * TCPDUMP of API request/response between worker and Zeebe * Zeebe logs at level=ALL [Zeebe ActivateJobs Bug.txt](https://github.com/zeebe-io/zeebe/files/5605360/Zeebe.ActivateJobs.Bug.txt) **Environment:** - OS: Ubuntu Mate 18.04 (worker), Ubuntu Mate 16.04 (Zeebe) - Zeebe Version: 0.25.1 - Configuration: Python client: ``` Package Version ----------------- --------- certifi 2020.11.8 chardet 3.0.4 docker 4.4.0 elasticsearch 6.8.1 elasticsearch-dsl 6.4.0 grpcio 1.33.2 idna 2.10 pip 20.2.4 protobuf 3.13.0 python-dateutil 2.8.1 PyYAML 5.3.1 requests 2.25.0 setuptools 50.3.2 six 1.15.0 urllib3 1.26.2 websocket-client 0.57.0 wheel 0.35.1 zeebe-grpc 0.25.1.0 ``` Zeebe config: ```yaml --- # ---------------------------------------------------- # Zeebe Standalone Broker configuration file (with embedded gateway) # This file is based on broker.standalone.yaml.template but stripped down to contain only a limited # set of configuration options. These are a good starting point to get to know Zeebe. # For advanced configuration options, have a look at the templates in this folder. # !!! Note that this configuration is not suitable for running a standalone gateway. !!! # If you want to run a standalone gateway node, please have a look at gateway.yaml.template # ---------------------------------------------------- # Byte sizes # For buffers and others must be specified as strings and follow the following # format: "10U" where U (unit) must be replaced with KB = Kilobytes, MB = Megabytes or GB = Gigabytes. # If unit is omitted then the default unit is simply bytes. # Example: # sendBufferSize = "16MB" (creates a buffer of 16 Megabytes) # # Time units # Timeouts, intervals, and the likes, must be specified either in the standard ISO-8601 format used # by java.time.Duration, or as strings with the following format: "VU", where: # - V is a numerical value (e.g. 1, 5, 10, etc.) # - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours # # Paths: # Relative paths are resolved relative to the installation directory of the broker. zeebe: broker: gateway: # Enable the embedded gateway to start on broker startup. # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE. enable: true network: # Sets the port the embedded gateway binds to. # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT. port: 26500 security: # Enables TLS authentication between clients and the gateway # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED. enabled: false network: # Controls the default host the broker should bind to. Can be overwritten on a # per binding basis for client, management and replication # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST. host: 0.0.0.0 data: # Specify a list of directories in which data is stored. # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES. directories: [ "{{ zeebe_data_directories }}" ] # The size of data log segment files. # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE. logSegmentSize: 512MB # How often we take snapshots of streams (time unit) # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD. snapshotPeriod: 15m cluster: # Specifies the Zeebe cluster size. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE. clusterSize: 1 # Controls the replication factor, which defines the count of replicas per partition. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR. replicationFactor: 1 # Controls the number of partitions, which should exist in the cluster. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT. partitionsCount: 1 threads: # Controls the number of non-blocking CPU threads to be used. # WARNING: You should never specify a value that is larger than the number of physical cores # available. Good practice is to leave 1-2 cores for ioThreads and the operating # system (it has to run somewhere). For example, when running Zeebe on a machine # which has 4 cores, a good value would be 2. # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT cpuThreadCount: 2 # Controls the number of io threads to be used. # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT ioThreadCount: 2 # Elasticsearch Exporter ---------- # An example configuration for the elasticsearch exporter: # # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_..." # exporters: elasticsearch: className: io.zeebe.exporter.ElasticsearchExporter args: url: http://elasticsearch:9200 bulk: delay: 5 size: 1000 # authentication: # username: elastic # password: changeme index: prefix: zeebe-record createTemplate: true command: false event: true rejection: false deployment: true error: true incident: true job: true jobBatch: false message: false messageSubscription: false variable: true variableDocument: true workflowInstance: true workflowInstanceCreation: false workflowInstanceSubscription: false ignoreVariablesAbove: 32677 ```

It was initially discussed here : Documentation Element (1 or more) and Camunda Extension to "Type" the Documentation