Patterns to resolve the disk fullness issue

Alexey Vinogradov: Hello Team!

Are there any safe patterns to resolve the disk fullness issue? With as much less interruption as possible

zell: Do you have exporters running? Can you increase the disk? I think I need to know more about your setup and disk size etc.

Alexey Vinogradov: Oops, sorry to bother you <@U6WCLLNGJ>, it seems that we cannot export events to our exporters (due to the disk issue too :sweat_smile:), so there is no problem with a broker.

zell: So you have currently problems with exporting to elastic do I get this right? :slightly_smiling_face:

Alexey Vinogradov: Yes! Because the disk space on it is almost ended :slightly_smiling_face:

zell: If you use the newest helm charts you could check out the new retention policy https://github.com/camunda/camunda-cloud-helm/blob/main/charts/ccsm-helm/values.yaml#L413-L428 this will add curator to clean up elastic indexes :slightly_smiling_face:

zell: How we use it in our benchmarks https://github.com/camunda-cloud/zeebe/pull/8847

Alexey Vinogradov: Thanks, <@U6WCLLNGJ>, but we are not using the K8S environment :disappointed:

zell: Good point to migrate :smile:

Note: This post was generated by Slack Archivist from a conversation in the Zeebe Slack, a source of valuable discussions on Zeebe (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.

Thomas Heinrichs: This sounds like another topic which might be beneficial for the Troubleshooting guide :thinking_face:

zell: Yes, for the sake of completeness:

If zeebe disk is full:

• Check whether you have exporters
â—¦ Yes. Are they exporting?
â—¦ No? Why? -> Maybe also disk full, make sure elastic has enough disk space. You can clean up indicies via curator https://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html.
â—¦ Be aware that if elastic reached certain watermarks it no longer accepts new data. This needs to be reset, after the disk is cleaned up https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation
• If you have no exporter or exporters are still exporting, verify how many running instances you have.
â—¦ Big state can also lead to out of disk space. Either increase the disk or complete some instances. You can check the snapshot sizes, to find out how much data you have in the internal state. If you want to inspect the data check out https://github.com/Zelldon/zdb

Alexey Vinogradov: Sorry, but for us right now is out of the agenda (I mean migrate to K8S) :disappointed:

BTW Thanks @zell @Thomas Heinrichs for your help :slightly_smiling_face: