Job executor task prioritization for 60000 process/minute

cjcalmeida · February 14, 2022, 6:01pm

Hello everyone,

I am implementing a fraud prevention process, where we will have a process like this:

Each entity and model is apis call with response time between 25ms and 150ms. Executor is only a java class to compute the responses (executions time between 1ms and 5ms).

This process will demand 60000 instances per minute and basically must orchestrate API calls.

I’ve tested many configurations but the executor job first acquires the initial jobs (entities) of all requests and only after processing all of them it continues the process (model and executor), which is causing a very high response time. We are trying to achieve a response time of up to 700ms/instance but so far we have achieved a response time of over 20 seconds in a load test of 16,000 process/minute.

I already tried to configure job prioritization by priority and due date, I increased the max-pool-size, queue-capacity, jobs-per-acquisition and core-pool-size but nothing changes. I don’t know if this would be the starving issue, but do you have any suggestions for tuning or configuration?

Note: I’m running Camunda with Kotlin + Spring WebFlux and using features like coroutines and reactive features (controller and webclient). As we don’t need camunda’s data, we use DB H2 in-memory and extend auto deploy to fetch the processes we want to deploy from an external base, in this way we can parameterize the processes and make changes in deployment independently of Camunda’s db. (We follow the 24 Hours Fitness case with some changes.)

AdamLiberadzki · February 14, 2022, 6:35pm

Hi @cjcalmeida ,

Since you mentioned a lot of steps you took to tune performance, but you didn’t mention anything about History API: which History level you are using? Do you need history at all?

Maybe setting History Level to NONE could help?
https://camunda.com/best-practices/performance-tuning-camunda/#_history_backend

cjcalmeida · February 14, 2022, 6:45pm

Hi Adam, Thanks for the reply.

No, I’m not using History, we set to none the configuration: camunda.bpm.history-level.

My configuration is it:

camunda:
  bpm:
    history-level: none
    job-execution:
      enabled: true
      max-jobs-per-acquisition: 3
      max-pool-size: 20
      core-pool-size: 5
      keep-alive-seconds: 3
      wait-time-in-millis: 25
      queue-capacity: 3
    default-number-of-retries: 1
    job-executor-acquire-by-priority: true
    generic-properties:
      properties:
        job-executor-acquire-by-due-date: true

I did tests increasing the number of pools but these values were the ones that performed best

harish_malavade · February 15, 2022, 3:52pm

Not sure if you have a breakdown of all service tasks execution time, we had faced similar issue but the root cause being one of API taking more than 30 seconds based on certain data condition and every time that data condition was met job executor threads were busy executing that job and hence causing acquisition to slow down and jobs to be piled up in the queue.

It is important you identify any such outliers, you could query the job queue from act_ru_job table or via API for any active jobs with dueDates<=current date.
https://docs.camunda.org/manual/latest/reference/rest/job/get-query/

these are the most impactful settings with job executor, if you want a high throughput on async jobs increase queue size and core-pool-size to a number that is higher depending on CPU/memory of the hosting servers.
We have that currently set to 10 and 20 respectively.

Remember setting it to higher number can cause starvation if you have clustered process engine set up.
Here is reference on tuning these parameters : https://camunda.com/best-practices/performance-tuning-camunda/#_tuning_the_job_executor