I’m using camunda-zeebe 8.5, which is deployed in k8s using helm charts. I have a couple of questions: 1. How can I stop a frozen process, similar to kill -9 in Linux? 2.- I encountered a situation where the broker’s pods crashed due to OOM under high load, and the monitoring system displayed that there were 300 million unprocessed events, resulting in the partition not working. The only way to fix it was by completely removing and re-creating the pvc brokers. I am unsure of what caused this issue and whether there is a less drastic way to repair the brokers after such an event.
P.S. The queue was not processed, and the process where this occurred could not be stopped (hence the question of whether it is possible to kill the process).
For stopping frozen processes in Zeebe, you can use the CancelProcessInstance command via gRPC API or client SDKs, and for broker recovery after OOM crashes, try restarting with existing PVCs first as Zeebe should replay unprocessed events from the log. I found the following relevant resources:
- Cancel Process Instance RPC
- Process Instance Modification
- Zeebe Data Loss After PVC Deletion
- Corrupted Snapshot Recovery Experiment
- Backup and Restore for Zeebe
Does this help? If not, can anyone from the community jump in? ![]()
Hints: Use the Ask AI feature in Camunda’s documentation to chat with AI and get fast help. Report bugs and features in Camuda’s GitHub issue tracker. Trust the process. ![]()