There are obvious performance bottlenecks in the operation part of camunda8 under 10 minutes of million-level concurrency

The exporter is only responsible for writing the index at the beginning of zeebe, and the operator will read the zeebe data and write it to the index at the beginning of operation. This part is relatively slow. At present, it takes one hour to process 600000 data by changing the number of threads on a single node, but the CPU utilization rate is not high. I only reached 50% in the case of two cores, and the utilization rate of increasing the number of cores is low. It was originally found that operation can run in cluster mode, but now the utilization rate of single node is not increasing, and the performance is not improved in cluster mode. In addition, the cluster also has several errors.

Is there a question in there?