zeebe cluster Automatic system outage and no exception log，Is there any way to avoid this event, or to troubleshoot why it shuts down
without more information it is hard to help you here, so please answer the following questions:
- Which version are you using?
- How does your deployment look like? Are you using docker, kubernetes or anything else?
- How does your configuration look like? Show us how you configured zeebe.
- Can you see any resource consumption problems, like out of memory etc? Maybe via metrics or logging on os level
1.I am using zeebe 0.25.3
2.deployment using VM machine，CPU is 8 core and Memory is 16GB
zeebe: broker: gateway: # Enable the embedded gateway to start on broker startup. # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE. enable: true network: # Sets the port the embedded gateway binds to. # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT. port: 26500 commandApi: security: # Enables TLS authentication between clients and the gateway # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED. enabled: false network: # Controls the default host the broker should bind to. Can be overwritten on a # per binding basis for client, management and replication # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST. host: 10.18.58.239 data: # Specify a list of directories in which data is stored. # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES. directories: [ data ] # The size of data log segment files. # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE. logSegmentSize: 512MB # How often we take snapshots of streams (time unit) # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD. snapshotPeriod: 15m useMmap: true cluster: nodeId: 3 # Specifies the Zeebe cluster size. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE. clusterSize: 5 # Controls the replication factor, which defines the count of replicas per partition. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR. replicationFactor: 2 # Controls the number of partitions, which should exist in the cluster. # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT. partitionsCount: 10 initialContactPoints : [ 10.18.58.236:26502, 10.18.58.237:26502, 10.18.58.238:26502, 10.18.58.239:26502, 10.18.64.239:26502 ] backpressure: enabled : false threads: # Controls the number of non-blocking CPU threads to be used. # WARNING: You should never specify a value that is larger than the number of physical cores # available. Good practice is to leave 1-2 cores for ioThreads and the operating # system (it has to run somewhere). For example, when running Zeebe on a machine # which has 4 cores, a good value would be 2. # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT cpuThreadCount: 4 # Controls the number of io threads to be used. # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT ioThreadCount: 4 exporters: elasticsearch: className: io.zeebe.exporter.ElasticsearchExporter args: url: http://10.17.43.113:19200
4.The monitoring is under construction, and no abnormalities can be seen from the operating system
5.Restart the application and continue the service, but I don’t know when it will stop automatically
I found the reason why the process is automatically killed. OOM will automatically kill the thread due to high memory usage in Linux, but how much memory should be allocated by zeebe?
The short answer is more. Depending on what you are doing - volume, snapshot timing - the memory requirement will be different. Last I checked it, rebuilding a broker state on restart required more memory than running it.
The best way to understand the memory requirement in your scenario is to run your scenario and profile the memory usage. This will quite possibly change between versions, so I would do the profiling every time you look at upgrading versions.