Gateway Fails With Stream Write Error

Zelldon · February 20, 2023, 7:53am

What is the correlation of the stream write error seen in the gateway logs with persistent volumes i.e. zeebe broker data ?
- Not sure whether this is really related the stream error is very generic, and can have multiple causes. Client connection cancelations,interruptions and other things.
Why do the volumes not get deleted on helm uninstall command?
- This is a design decisions done by the helm tooling, nothing we have under control. See this open issue on the helm project Helm 'delete' doesn't delete PVCs · Issue #5156 · helm/helm · GitHub
We have observed that any change in the helm values configuration file requires us to delete all the persistent volumes before we start the benchmarking again. Could you please explain the technical reason behind this?
- Not for every change but some of them it is necessary yes, especially for configuring partition count, replication factor, cluster size. These properties are persistent and Zeebe doesn’t support right now no dynamic change of these. See Dynamic rebalancing of partition count and follower replicas · Issue #4405 · camunda/zeebe · GitHub and https://github.com/camunda/zeebe/issues/4391

I think I asked you already some time ago why you use replication factor 2? We recommend using an odd number, 1, 3, 5 etc. Due to the RAFT consensus protocol, we use an even count doesn’t really make sense. See related post of mine Zeebe startup issue - #4 by Zelldon

Regarding your observation

3 brokers, 3 partitions - No error

4 brokers, 4 partitions - No error

5 brokers, 5 partitions - Stream Error

6 brokers, 12 partitions - Stream Error

7 brokers, 11 partitions - Stream Error

Do you change the configurations without deleting the PVC’s? Do you see any errors in the broker logs? Are there any exceptions in the gateway log you haven’t posted? Do you restarted the clients? Please post the complete stacktrace, to determine what is going on.

Greets
Chris