We are using Zeebe Operate ver 1.1.2 and are seeing severe performance issues. We have a single node elastic search cluster dedicated for Zeebe (ver 1.0.0) and Operate.
We have more than 1.2 lakh instances being displayed on Operate.
Sometimes it takes a lot of time to
- show all the deployed WFs
- open and load a workflow instance
- login to Zeebe Operate.
Following are some of the exceptions we have seen in Operate Logs:
search_context_missing_exception in Operate Logs
2023-06-22 15:03:03.659 ERROR 7 — [-8080-exec-1905] o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]; nested: ElasticsearchException[Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id [574710291]]];]
Operate Import Exception in Operate Logs
2023-06-23 00:23:35.013 ERROR 7 — [ import_3] i.c.o.z.ImportJob : io.camunda.operate.exceptions.PersistenceException: Error when processing bulk request against Elasticsearch: 30,000 milliseconds timeout on connection http-outgoing-29 [ACTIVE]
io.camunda.operate.exceptions.PersistenceException: io.camunda.operate.exceptions.PersistenceException: Error when processing bulk request against Elasticsearch: 30,000 milliseconds timeout on connection http-outgoing-29 [ACTIVE]
at io.camunda.operate.zeebeimport.AbstractImportBatchProcessor.performImport(AbstractImportBatchProcessor.java:34) ~[operate-importer-common-1.1.2.jar!/:?]
at io.camunda.operate.zeebeimport.ImportJob.processOneIndexBatch(ImportJob.java:116) ~[operate-importer-1.1.2.jar!/:?]
at io.camunda.operate.zeebeimport.ImportJob.call(ImportJob.java:80) ~[operate-importer-1.1.2.jar!/:?]
at io.camunda.operate.zeebeimport.RecordsReader.lambda$scheduleImport$1(RecordsReader.java:217) ~[operate-importer-1.1.2.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.camunda.operate.exceptions.PersistenceException: Error when processing bulk request against Elasticsearch: 30,000 milliseconds timeout on connection http-outgoing-29 [ACTIVE]
at io.camunda.operate.util.ElasticsearchUtil.processBulkRequest(ElasticsearchUtil.java:254) ~[operate-els-schema-1.1.2.jar!/:?]
at io.camunda.operate.util.ElasticsearchUtil.processBulkRequest(ElasticsearchUtil.java:233) ~[operate-els-schema-1.1.2.jar!/:?]
Delete Scroll Exception in Operate logs
2023-06-23 08:12:21.089 ERROR 7 — [-8080-exec-2742] o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is ElasticsearchStatusException[Unable to parse response body]; nested: ResponseException[method [DELETE], host [http://elasticsearch:9200], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
{“succeeded”:true,“num_freed”:0}];] with root cause
org.elasticsearch.client.ResponseException: method [DELETE], host [http://elasticsearch:9200], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
{“succeeded”:true,“num_freed”:0}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:326) ~[elasticsearch-rest-client-7.13.2.jar!/:7.13.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:296) ~[elasticsearch-rest-client-7.13.2.jar!/:7.13.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270) ~[elasticsearch-rest-client-7.13.2.jar!/:7.13.2]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1654) ~[elasticsearch-rest-high-level-client-7.13.2.jar!/:7.13.2]
ElasticsearchException Exception in Operate logs
org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@4cb9ae40 on QueueResizingEsThreadPoolExecutor[name = elasticsearch-0/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 7.8ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@57f4d46b[Running, pool size = 10, active threads = 10, queued tasks = 1000, completed tasks = 653786320]]]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:485) ~[elasticsearch-7.13.2.jar!/:7.13.2]
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:396) ~[elasticsearch-7.13.2.jar!/:7.13.2]
Please let us know how to fix these issues and improve Zeebe Operate 1.1.2 Performance.
Would upgrading to latest version of Zeebe Operate fix these Performance issues?
Any suggestions on these lines would be really helpful.