Hi,
We are encountering a critical issue in our Camunda 8 / Zeebe cluster and would appreciate any guidance.
Error
We are receiving the following error when trying to execute commands against Zeebe:
errorCode: io.camunda.zeebe.client.api.command.ClientStatusException:
Expected to execute the command on one of the partitions, but all failed;
there are no more partitions available to retry. Please try again.
If the error persists contact your zeebe operator.
Environment Details
-
Zeebe Disk Usage: ~94% across partitions
-
Camunda Version: 8.8
-
Number of Brokers: 3
Observations
-
The error started occurring when disk usage on the Zeebe partitions reached approximately 94%.
-
All partitions appear to be rejecting commands simultaneously.
-
No single partition is available to accept and process commands.
What We’ve Tried
-
Checked disk usage on all Zeebe broker nodes — confirmed usage is at ~94%.
-
Reviewed Zeebe broker logs for additional errors or warnings.
-
Attempted to restart brokers, but the issue persists.
Questions
-
Is the 94% disk usage the root cause? Does Zeebe enforce a disk usage threshold beyond which it stops accepting commands on partitions? If so, what is the default threshold?
-
What is the recommended way to recover? Can we safely free up disk space (e.g., by triggering compaction, deleting old snapshots, or increasing disk size) while the cluster is in this state?
-
How can we prevent this in the future? Are there best practices for configuring disk usage alerts or setting Zeebe’s
diskUsageCommandWatermarkanddiskUsageReplicationWatermarkthresholds? -
Is there a way to force Zeebe to resume processing once disk space is freed, or does it recover automatically?
Any insights, similar experiences, or documentation pointers would be greatly appreciated. Thank you!