Unexpected behaviour in multi-tenant mode

Hi there,
I’ve encountered some unexpected issues while using Camunda 8.6 multi-tenant mode (Self Managed).

I’ve followed the docs and enabled authentication via Identity and also enabled global multi-tenancy config (Helm). Additionally, I have 3 tenants: default, tenant-1 and tenant-2 (not real names).

Issues:

  1. When I start the process instance from Tasklist UI for given tenant the process instance is not visible in Operate (user has permission to see instances of every tenant). Also, workers are working fine but tasks are not created (no tasks in the Tasklist)
  2. Same happened when I start the process programmatically (Spring Boot). The instance is created but when it comes to the user task, that task is not visible in Tasklist component. I’ve tried to manually call endpoint to search for tasks but it returns empty list.
  3. Resources (BPMNs & DMNs) are not uploaded on application startup. I’m using @Deployment annotation and I have configured tenant-id in YML configuration.

YML Config:

camunda:
  client:
    mode: self-managed
    tenant-ids:
      - tenant-1
    auth:
      client-id: app
      client-secret: some-secret
      issuer: http://keycloak.local:8080/auth/realms/camunda-platform
    zeebe:
      enabled: true
      rest-address: http://camunda-zeebe-gateway.local:8080
      prefer-rest-over-grpc: false
      audience: zeebe-api
      grpc-address: http://camunda-zeebe-gateway.local:26500
    operate:
      enabled: true
      base-url: http://camunda-operate.local/operate
      audience: operate-api
    tasklist:
      enabled: true
      base-url: http://camunda-tasklist.local/tasklist
      audience: tasklist-api
    optimize:
      enabled: true
      base-url: http://camunda.svc.cluster.local:80/optimize
      audience: optimize-api
    identity:
      enabled: true
      base-url: http://camunda.svc.cluster.local:80/identity
      audience: identity-api

Does anyone have a clue what could be wrong here?

@heril.muratovic06 , To disable tasklist user restrictions in a Camunda 8 environment deployed using Helm, you can configure the following value in your Helm chart:

camunda:
  tasklist:
    userRestrictions: false

Here are the logs from Operate - this might be the issue:

2025-04-18 21:15:45 2025-04-18 19:15:45.173 [] [archiver_1] [] ERROR
2025-04-18 21:15:45       io.camunda.operate.archiver.AbstractArchiverJob - Error occurred while archiving data. Will be retried.
2025-04-18 21:15:45 java.util.concurrent.CompletionException: io.camunda.operate.exceptions.OperateRuntimeException: Failures occurred when performing operation reindex on source index operate-batch-operation-1.0.0_. Check Elasticsearch logs.
2025-04-18 21:15:45     at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
2025-04-18 21:15:45     at io.camunda.operate.util.ElasticsearchUtil$1.run(ElasticsearchUtil.java:196) ~[operate-schema-8.6.8.jar:8.6.8]
2025-04-18 21:15:45     at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-6.1.14.jar:6.1.14]
2025-04-18 21:15:45     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
2025-04-18 21:15:45     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
2025-04-18 21:15:45     at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2025-04-18 21:15:45 Caused by: io.camunda.operate.exceptions.OperateRuntimeException: Failures occurred when performing operation reindex on source index operate-batch-operation-1.0.0_. Check Elasticsearch logs.
2025-04-18 21:15:45     at io.camunda.operate.util.ElasticsearchUtil.getTotalAffectedFromTask(ElasticsearchUtil.java:218) ~[operate-schema-8.6.8.jar:8.6.8]
2025-04-18 21:15:45     at io.camunda.operate.util.ElasticsearchUtil$1.run(ElasticsearchUtil.java:188) ~[operate-schema-8.6.8.jar:8.6.8]

In Zeebe I see the following logs:

io.camunda.zeebe.broker.exporter.elasticsearch - Unexpected exception occurred on periodically flushing bulk, will retry later.
2025-04-18 21:25:45 io.camunda.zeebe.exporter.ElasticsearchExporterException: Failed to flush bulk request: [Failed to flush 1 item(s) of bulk request [type: validation_exception, reason: Validation Failed: 1: this action would add [1] shards, but this cluster currently has [1000]/[1000] maximum normal shards open; for more information, see https://www.elastic.co/guide/en/elasticsearch/reference/8.17/size-your-shards.html#troubleshooting-max-shards-open;]]

Additionally, when I run the command: http://localhost:9200/_cat/nodes?v&h=name,ip,node.role,shards the output is the following:

name         ip         node.role   shards
71b00020d7ec 172.18.0.3 cdfhilmrstw    1000

Any idea how to resolve this?

This is in general related to reaching the shards limit per node in ElasticSearch.

What is in general best practice for shards count per node?

This might be Elastisearch topic.