Elasticsearch shard reaches default max with Helm deployment

AndreasNehl · January 4, 2023, 3:37pm

We have a project where we encountered a problem with shards for elasticsearch in Camunda 8. Camunda is deployment using the helm charts for full scale deployment in a Kubernetes environment:
https://docs.camunda.io/docs/self-managed/platform-deployment/helm-kubernetes/deploy/

We get the error message shown below:
io.camunda.zeebe.exporter.ElasticsearchExporterException: Failed to flush bulk request: [Failed to flush item(s) of bulk request [type: validation_exception, reason: Validation Failed: 1: this action would add [1] shards, but this cluster currently has [1000]/[1000] maximum normal shards open;]]

The problem was solved temporarily by increasing the number of shards through the elasticsearch API.
The question is:
How should we think when going forward, so that we don’t run into the same problem again for both Helm and other deployment types? Are there any other settings that we should check? How should we set the shard-value before the deployment with Helm?

Kind regards,
Apendo Dev Team

StephanHaarmann · January 4, 2023, 3:53pm

Hi Apendo Team and welcome to the forum,

My understanding is the following: The different components of Camunda create new shards daily, which can result in many shards and may even exceed the limit of 1000.

There are different parameters that you may want to consider:

Zeebe has a parameter numberOfShards which you can configure via an environment variable. Its default value is 3 (thus, 3 new shards are created daily by Zeebe).
Camunda 8 has a retention policy to configure when shards should be deleted. In the self-managed setup, this retention policy is not activated by default. Within the values file, you can set the parameter retentionPolicy.enabled to true. Without any additional configuration, Zeebe’s shards will be deleted daily. The shards for operate and Tasklist will be deleted after 30 days.

I don’t consider myself an expert on this topic and invite others to comment as well.

Regards,
Stephan

nathanael · February 7, 2023, 2:47pm

Hi,

I have the same problem with a dev machine deployed with docker-compose.
How can the retentionPolicy configured with docker-compose env variables? I don’t find any hints in the docs.

Regards,
Nathanael

StephanHaarmann · February 7, 2023, 3:09pm

Hi @nathanael,

It’s true this is currently not part of the documentation.
You should be able to activate the retention policy via the environment variable RETENTIONPOLICY_ENABLED. Set the value to true.

nathanael · February 8, 2023, 9:02am

Hi @StephanHaarmann,

thanks for your quick answer!
But I’m not sure on which containers I need to configure the retention policy? I put it on zeebe, operate and tasklist. Is this correct? I also configured RETENTIONPOLICY_SCHEDULE but I do not see any log message and there are still 1000 shards …
Do I need another docker container? bitnami/elasticsearch-curator

Best regards,
Nathanael

StephanHaarmann · February 8, 2023, 9:48am

Hi,
I just double-checked, and my assumption was wrong.
This is not a property of the components. Instead, Elasticsearches currator will be configured to clean up. I think there is no equivalent in the docker compose files.

nathanael · February 8, 2023, 12:49pm

Ok, too bad, but I’ll see if I could use the curator container somehow via docker-compose.

jgeek1 · April 26, 2023, 11:08am

@nathanael - any update on this? We too are on the docker based setup and are hitting this error.

Also would adding this configuration and restarting the docker components help out? Are there any other workarounds?

aravindhrs · November 14, 2023, 6:22am

@StephanHaarmann / @nathan.loding Any updates on this issue?

+1

StephanHaarmann · November 16, 2023, 6:28am

The recent versions of Camunda 8 use the Elasticsearchs livecycle management. The curator is no longer needed (e.g., in ES 8.X). You can also drop dated indices manually without affected running instances.
Depending on your setup and volume, you may still need a higher number of shards than the default configuration (1000).

aravindhrs · November 17, 2023, 5:23am

@StephanHaarmann Thanks for you reply.

How can we retrieve the list of indexes from ES in a self-managed cluster which was created by different camunda components.
Do we have documented what index created by which components (like operate, optimize, tasklist, broker, GW, etc)?

Ingo_Richtsmeier · November 17, 2023, 12:27pm

Hi @aravindhrs,

you can use Kibana to look into your Elasticsearch database.

The docker-compose installation includes a profile to enable the Kibana container: GitHub - camunda/camunda-platform: Links to Camunda Platform 8 resources, releases, and local development config

Hope this helps, Ingo