Zeebe nfs storage

Art · March 25, 2024, 6:40am

Hi,

Can anybody explain me please what does mean this fragment of documentation (what is the reason of such limitation)?
Zeebe does not support network file systems (NFS) other types of network storage volumes at this time. Usage of NFS may cause data corruption.

Can I use CIFS as a data storage? I have bare metal k8s cluster with CIFS mounted windows share as a persistent storage.

Regards.

nathan.loding · March 26, 2024, 4:01pm

Hi @Art - Zeebe’s internal brokers use embedded RocksDB instances to track local data for internal processing, before the data is exported via the exporter. Combined with partitioning, the consensus protocol, and the distributed nature of Zeebe, this makes networked storage slower, more prone to errors, and generally unreliable enough that we cannot support it.

Of course, you are welcome to try it, and depending on the workload on the engine and how large or distributed your Zeebe deployment is, it is possible you may not encounter any issues. However, it is unsupported and strongly not recommended.

We do have a newly opened feature request for NFS support, and you are welcome to add your thoughts and follow the GitHub issue: Support NFS · Issue #16686 · camunda/zeebe · GitHub

Ryan_Waszak · March 27, 2024, 2:36pm

First of all thank you @nathan.loding and team for the incredible work here. Zeebe is such an amazing tool to work with!

I have also been curious about the NFS statement as it says “other types of network storage volumes at this time”.

Is that saying that there is no persistent volume storage support or simply not for type NFS in particular?

I see in the helm charts for zeebe that it mounts the data directory here camunda-platform-helm/charts/camunda-platform/templates/zeebe/statefulset.yaml at 72db6990d91627488f0787b6d9463e9a2d33af09 · camunda/camunda-platform-helm · GitHub

And I believe it does it to disk by default: camunda-platform-helm/charts/camunda-platform/templates/zeebe/statefulset.yaml at 72db6990d91627488f0787b6d9463e9a2d33af09 · camunda/camunda-platform-helm · GitHub

Additionally the deployment docs Kubernetes deployment | Camunda 8 Docs state

Zeebe broker nodes need to be deployed as a StatefulSet to preserve the identity of cluster nodes. StatefulSets require persistent storage, which must be allocated in advance. Depending on your cloud provider, the persistent storage differs as it is provider-specific.

So my questions are:

Is my understanding correct that helm charts mount the data directory for each broker to persistent storage via stateful sets and that data persists even if the broker crashes (side effect of stateful sets)?
If yes to #1 does this harm performance as you mentioned above in your statement “this makes networked storage slower, more prone to errors, and generally unreliable enough that we cannot support it.”
If yes to #1, is that basically a fail safe in case all brokers for a partition go down and there has not been a backup to say S3 for some time?
In the quote from the documentation about stateful sets it states that they are needed to preserve the identity of cluster nodes. This almost sounds like the statement is saying stateful sets are only required for the identity preservation, and that persistent storage is only used because it is inherently required by stateful sets (which isn’t true). So do you guys use stateful sets because you need both id preservation AND storage persistence, or do you really only require id persistence since storage preservation is achieved with adequate replication and multiple clusters for fault tolerance?

#4 has me very curious because I am deploying a multi cluster self managed zeebe setup and am able to preserve the identity of the cluster nodes via k8 deployments rather than stateful sets so would really love to hear about your experiences with stateful sets and what troubles they may solve beyond just id preservation. If id preservation is the main reason that is great news and I will proceed with k8 deployments instead. If the persistent data storage is also required I would love to hear a bit more of the why there as I have been reading all these amazing blogs and forums for months and haven’t come across much on the data persistence necessity but have seen more about replication and external data store backups than the actual use of volume mounted persistent storage.

nathan.loding · March 27, 2024, 9:20pm

Hi @Ryan_Waszak - the warning is specific to NFS and network storage volumes. As mentioned in my previous reply, the Zeebe brokers use an embedded RocksDB to track process state during internal processing before it’s shipped off to the exporter. If you don’t use persistent volumes and the brokers restart or crash, you lose any information that was in the middle of that internal processing lifecycle. Which means that you can lose some data related to your currently running processes. These links should shed more light on the internals:

Ryan_Waszak · March 28, 2024, 2:50am

Thanks @nathan.loding. Have read all of these already and the videos and docs that they reference as well and they are great!

I noticed that when we deploy a new process definition that it commits to every partition. So lets say we have 2 partitions A and B. We then deploy a process definition and start two instances.

Instance one goes to partition A and instance two goes to partition B (round-robin by default)

Then A loses all its brokers to some crash before the instance finishes and did NOT have a persistent volume mounted for its data.

In this case:

Instance 1 from partition A is totally lost (due to no persistent volume)
We know instance 2 from partition B is fine because B brokers are still up
We can also still start new instances to this definition because the definition exists on all partitions and so exists on the stable partition B

Correct?

Thanks again for all you guys have done!

nathan.loding · March 28, 2024, 2:24pm

@Ryan_Waszak - I am not an expert on the internals, and there’s additional details to what you’ve outlined, but yes, I believe that’s accurate. Some data from/about the first instance would be exported via the exporter your data store, but the running process isn’t exactly recoverable from there.

system · June 26, 2024, 2:25pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.