Job Executor: Pool Sizes: Reason for default values?

Can anyone from @camunda/@thorben provide some insight into default value reasoning for Pool Size.

looking at tomcat specifically: https://docs.camunda.org/manual/7.8/user-guide/runtime-container-integration/tomcat/

Default core pool size is 3 and max pool is 10. https://docs.camunda.org/manual/7.8/user-guide/runtime-container-integration/tomcat/#core-pool-size

What is the reasoning for this number? Is there benchmarks on higher numbers scaled with infrastructure size? and any performance considerations between increase in pool size vs increasing the number of nodes?

Additionally can some “simplified” examples be given for: https://docs.camunda.org/manual/7.8/user-guide/runtime-container-integration/tomcat/#maximum-pool-size vs https://docs.camunda.org/manual/7.8/user-guide/runtime-container-integration/tomcat/#queue-size

Thanks!

There is not much science involved here, i.e. we picked values we thought reasonable.

None that I am aware of. I’m pretty sure that also depends on the workload, e.g. how much time the threads spend with CPU vs IO.

As a general statement: We sometimes see that people set the pool size to very high values (in the hundreds) which is often not a good idea as after a certain point throughput will no longer increase or can also decrease. If you reach that point, adding a new node can make sense. Also, make sure your database connection pool has at least the same size as the thread pool.

3 Likes

Hi Stephen,

I did some benchmarking simulating a process with a few remote procedure calls, ie blocking I/O of 200ms - 1000ms a while back and I made the following observations;

Massive increases in throughput when increasing to 20 threads.
Still good gains up to 30 threads however starting to see diminishing rate of return.
At 90 threads, throughput no longer increased.

Hence I typically set the max thread pool to be at least 10 and up to 20 threads as this was a sweet spot. Whilst the benchmark suggested more, I should point out throughput was very sensitive to I/O blocking. Hence the rationale for not blindly configuring 90 threads for example.

The next area of sensitivity I noticed was that for job acquisition, we get performance from fewer bigger acquisitions than more frequent smaller job acquisitions…

regards

Rob

4 Likes

Do you assume this is caused by a certain amount of overhead in the job acquisition? Once the jobs are in the queue then it is taking relatively little time to process them into a thread?

Hi Stephen,

Yes - in a prior post, I used the analogy of an old steam train. The boiler is the thread pool, the stoker the job acquisition thread. You need to balance the rate at which the stoker shovels coal with the capacity of the boiler…In the case of a cluster, it can be as if you have two stokers working from a common pile of coal…

With job acquisition there is intrinsic DB overhead and in the case of a remote DB, a network round trip. Hence intuitively fetch a batch should achieve better throughput that a row at a time. In the case of a cluster, there is another dimension, job acquisition contention. It may be tempting to reduce the fetch size to reduce the chance of collisions, however this requires more frequent round trips. With more frequent round trips, the chance of overlapping acquisition tended to increase. Hence the job executors tended to back-off. Thus there is a sweet spot between frequency of acquisition and batch size…

regards

Rob

2 Likes

Thats a good one thanks