I have a some questions about the job acquisition in Camunda 7, based on some issues we are seeing in production.
We are running with two nodes, using Camunda 7.21 as Spring Boot application.
This is the relevant part of my config
camunda:
bpm:
job-executor-acquire-by-priority: true
job-execution:
queue-capacity: 60
max-jobs-per-acquisition: 50
core-pool-size: 40
max-pool-size: 40
wait-time-in-millis: 3000
wait-increase-factor: 1.5
max-wait: 10000
backoff-time-in-millis: 2000
max-backoff: 10000
generic-properties:
properties:
jobExecutorAcquireExclusiveOverProcessHierarchies: true
hintJobExecutor: false
So the max-jobs-per-acquisition is set to 50.
But I have been wondering about why the camunda log is showing that over 300 jobs are acquired in “one go” sometimes.
Job Executor Configuration | docs.camunda.org says the following " maxJobsPerAcquisition
Integer Sets the maximal number of jobs to be acquired at once."
I’ve been looking at the code for BackoffJobAcquisitionStrategy
int numJobsToAcquire = (int) (baseNumJobsToAcquire * Math.pow(backoffIncreaseFactor, backoffLevel));
And that code explains why more than the “max” configured is actually acquired.
So I think either the documentation should be enhanced, to document that it can grow, like the actual idle time wait.
But I consider it really strange that the max acquired jobs are dependent on the “backoff level” and “backoff increaseFactor”.
Because it means that when the job executor is “finally” able to acquire jobs after some optimistic locking, it might acquire a lot of jobs.
So I am tempted at raising a bug, to have the configured max-jobs-per-acquisition be the actual max number of jobs being acquired in one job
acquisition.
What do you think ?
With the current behaviour, our thread pool seems to be too small, causing a lot of the acquired jobs in such a large acquisition to be rejected.
The main issue we are seeing now, is that in some circumstances (when there are many jobs being created), the job acquisition takes quite a bit of time, between 5 to 10 seconds.
This is seen by
"
2024-08-01 01:04:04,943 [DEBUG] 53531 [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor correlationId= : ENGINE-14012 Job acquisition thread woke up
2024-08-01 01:04:12,266 [DEBUG] 53531 [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor correlationId= : ENGINE-14022 Acquired 209 jobs for process engine ‘bla’: [[2ea174c1-4f91-11ef-b4c6-001a4a3b04fe], [2eac4ab0-4f91-11ef-b4c6-
001a4a3b04fe, 2e97b0f2-4f91-11ef-b4c6-001a4a3b04fe], [2e9a701b-4f91-11ef-b4c6-001a4a3b04fe] …
"
But I think I will raise a separate discussion topic for that issue.
Regards
Alf Høgemark