Each entity and model is apis call with response time between 25ms and 150ms. Executor is only a java class to compute the responses (executions time between 1ms and 5ms).
This process will demand 60000 instances per minute and basically must orchestrate API calls.
I’ve tested many configurations but the executor job first acquires the initial jobs (entities) of all requests and only after processing all of them it continues the process (model and executor), which is causing a very high response time. We are trying to achieve a response time of up to 700ms/instance but so far we have achieved a response time of over 20 seconds in a load test of 16,000 process/minute.
I already tried to configure job prioritization by priority and due date, I increased the max-pool-size, queue-capacity, jobs-per-acquisition and core-pool-size but nothing changes. I don’t know if this would be the starving issue, but do you have any suggestions for tuning or configuration?
Note: I’m running Camunda with Kotlin + Spring WebFlux and using features like coroutines and reactive features (controller and webclient). As we don’t need camunda’s data, we use DB H2 in-memory and extend auto deploy to fetch the processes we want to deploy from an external base, in this way we can parameterize the processes and make changes in deployment independently of Camunda’s db. (We follow the 24 Hours Fitness case with some changes.)
Since you mentioned a lot of steps you took to tune performance, but you didn’t mention anything about History API: which History level you are using? Do you need history at all?
Not sure if you have a breakdown of all service tasks execution time, we had faced similar issue but the root cause being one of API taking more than 30 seconds based on certain data condition and every time that data condition was met job executor threads were busy executing that job and hence causing acquisition to slow down and jobs to be piled up in the queue.
these are the most impactful settings with job executor, if you want a high throughput on async jobs increase queue size and core-pool-size to a number that is higher depending on CPU/memory of the hosting servers.
We have that currently set to 10 and 20 respectively.