Operate - Elastic Search all shard unassigned

Vojtech_Mraz · May 21, 2024, 6:31am

Hi, I am having trouble with camanda-operate in my K3s cluster, any help would help as I am getting quite desperate at the moment.

Here are some information:

I have K3s cluster on a server with helm. I am installing camanda self managed platform using helm charts. I am using default values provided with only small changes to resources and number of replicas. Elastic search is up and runnning and I can access it and use it as needed. Zeebe, ZeebeGateway, Connectors pods are also up and running, but Operate pod is still failing due to failing Shards in elastic search:

2024-05-20 20:36:36.028  WARN 7 --- [           main] o.e.c.RestClient                         : request [POST http://camunda-elasticsearch:9200/operate-migration-steps-repository-1.1.0_/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-8.12.2-48a287ab9497e852de30327444b0809e55d46466 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
2024-05-20 20:36:36.029  WARN 7 --- [           main] i.c.o.u.RetryOperation                   : Retry Operation Count search results failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]

When I look into elastric search pod, I see this:

Caused by: org.elasticsearch.action.NoShardAvailableActionException
	at org.elasticsearch.server@8.12.2/org.elasticsearch.action.NoShardAvailableActionException.forOnShardFailureWrapper(NoShardAvailableActionException.java:28)
	at org.elasticsearch.server@8.12.2/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:529)
	at org.elasticsearch.server@8.12.2/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:476)

_cluster/health:

{
    "cluster_name": "elastic",
    "status": "red",
    "timed_out": false,
    "number_of_nodes": 2,
    "number_of_data_nodes": 0,
    "active_primary_shards": 0,
    "active_shards": 0,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 22,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 0.0
}

_cat/shards?v:

index                                                         shard prirep state      docs store dataset ip node
operate-decision-requirements-8.3.0_                          0     p      UNASSIGNED                       
operate-metric-8.3.0_                                         0     p      UNASSIGNED                       
operate-operation-8.4.0_                                      0     p      UNASSIGNED                       
operate-list-view-8.3.0_                                      0     p      UNASSIGNED                       
operate-user-1.2.0_                                           0     p      UNASSIGNED                       
operate-web-session-1.1.0_                                    0     p      UNASSIGNED                       
operate-decision-instance-8.3.0_                              0     p      UNASSIGNED                       
operate-process-8.3.0_                                        0     p      UNASSIGNED                       
operate-decision-8.3.0_                                       0     p      UNASSIGNED                       
.ds-.logs-deprecation.elasticsearch-default-2024.05.20-000001 0     p      UNASSIGNED                       
operate-flownode-instance-8.3.1_                              0     p      UNASSIGNED                       
operate-post-importer-queue-8.3.0_                            0     p      UNASSIGNED                       
operate-import-position-8.3.0_                                0     p      UNASSIGNED                       
operate-event-8.3.0_                                          0     p      UNASSIGNED                       
operate-incident-8.3.1_                                       0     p      UNASSIGNED                       
operate-batch-operation-1.0.0_                                0     p      UNASSIGNED                       
operate-sequence-flow-8.3.0_                                  0     p      UNASSIGNED                       
.ds-ilm-history-5-2024.05.20-000001                           0     p      UNASSIGNED                       
operate-migration-steps-repository-1.1.0_                     0     p      UNASSIGNED                       
operate-user-task-8.5.0_                                      0     p      UNASSIGNED                       
operate-variable-8.3.0_                                       0     p      UNASSIGNED                       
operate-message-8.5.0_                                        0     p      UNASSIGNED

_cluster/allocation/explain?pretty:

{
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": "operate-decision-requirements-8.3.0_",
    "shard": 0,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "CLUSTER_RECOVERED",
        "at": "2024-05-20T19:49:30.119Z",
        "last_allocation_status": "no"
    },
    "can_allocate": "no",
    "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there."
}

I have no more information and it seems like there are no answers here on this topic, is there something I can do about it?

Thank you very much
Vojtech

thirumoorthy · May 29, 2024, 1:16pm

did you solved this?

Vojtech_Mraz · May 30, 2024, 8:46am

At the end yes, I do not have an exact solution as I was playing with the charts a lot. But essentially I had to play with the number of replicas and resources for all of the components and that was it. It seems like in elastci search “masterOnly” cannot be set to true as well. It has to be false. Also, you cannot have only 1 replica of elastic search, I found that little bit unlucky that you cannot set it up in camunda charts.

V.

system · June 6, 2024, 8:47am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.