I am new to the camunda workflow engine. We have a self-managed camunda 8 cluster deployed through docker run commands. We want to make sure that the cluster to be always available for executing workflows. We have deployed the prebuilt grafana dashboard that monitors the camunda cluster.
What kind of alerts should we build to make sure that the cluster is ready for workflow executions? I assume just adding alerts for gateway/broker instance failures won’t be enough. There could be multiple reasons (for e.g. disk failures/corruptions and more?) why the cluster cannot execute workflows even though the apps are up and running.
- Is my above assumption about failures correct?
- Would we need to add a periodic job (jenkins?) that acts like a zeebe client to deploy, trigger and check if the workflow completed as expected? Is there some simpler standard way of achieving this because maintaining a periodic job adds another challenge?