We have a project where we encountered a problem with shards for elasticsearch in Camunda 8. Camunda is deployment using the helm charts for full scale deployment in a Kubernetes environment:
We get the error message shown below:
io.camunda.zeebe.exporter.ElasticsearchExporterException: Failed to flush bulk request: [Failed to flush item(s) of bulk request [type: validation_exception, reason: Validation Failed: 1: this action would add  shards, but this cluster currently has / maximum normal shards open;]]
The problem was solved temporarily by increasing the number of shards through the elasticsearch API.
The question is:
How should we think when going forward, so that we don’t run into the same problem again for both Helm and other deployment types? Are there any other settings that we should check? How should we set the shard-value before the deployment with Helm?
Apendo Dev Team
Hi Apendo Team and welcome to the forum,
My understanding is the following: The different components of Camunda create new shards daily, which can result in many shards and may even exceed the limit of 1000.
There are different parameters that you may want to consider:
- Zeebe has a parameter
numberOfShards which you can configure via an environment variable. Its default value is 3 (thus, 3 new shards are created daily by Zeebe).
- Camunda 8 has a retention policy to configure when shards should be deleted. In the self-managed setup, this retention policy is not activated by default. Within the values file, you can set the parameter
true. Without any additional configuration, Zeebe’s shards will be deleted daily. The shards for operate and Tasklist will be deleted after 30 days.
I don’t consider myself an expert on this topic and invite others to comment as well.
I have the same problem with a dev machine deployed with docker-compose.
How can the
retentionPolicy configured with docker-compose env variables? I don’t find any hints in the docs.
It’s true this is currently not part of the documentation.
You should be able to activate the retention policy via the environment variable
RETENTIONPOLICY_ENABLED. Set the value to
thanks for your quick answer!
But I’m not sure on which containers I need to configure the retention policy? I put it on zeebe, operate and tasklist. Is this correct? I also configured
RETENTIONPOLICY_SCHEDULE but I do not see any log message and there are still 1000 shards …
Do I need another docker container?
I just double-checked, and my assumption was wrong.
This is not a property of the components. Instead, Elasticsearches currator will be configured to clean up. I think there is no equivalent in the docker compose files.
Ok, too bad, but I’ll see if I could use the curator container somehow via docker-compose.
@nathanael - any update on this? We too are on the docker based setup and are hitting this error.
Also would adding this configuration and restarting the docker components help out? Are there any other workarounds?
@StephanHaarmann / @nathan.loding Any updates on this issue?
The recent versions of Camunda 8 use the Elasticsearchs livecycle management. The curator is no longer needed (e.g., in ES 8.X). You can also drop dated indices manually without affected running instances.
Depending on your setup and volume, you may still need a higher number of shards than the default configuration (1000).
@StephanHaarmann Thanks for you reply.
- How can we retrieve the list of indexes from ES in a self-managed cluster which was created by different camunda components.
- Do we have documented what index created by which components (like operate, optimize, tasklist, broker, GW, etc)?
you can use Kibana to look into your Elasticsearch database.
The docker-compose installation includes a profile to enable the Kibana container: GitHub - camunda/camunda-platform: Links to Camunda Platform 8 resources, releases, and local development config
Hope this helps, Ingo