Hi everyone, I wanted to share a problem I have with the Camunda Engine that i think has to do with Job Executor.
I have a process that starts with Message Event and then i have few Service tasks after. After some number of active proceses (sometimes 500, sometimes 2500) camunda just stops doing any jobs on that process. Since i have Async after on the Message Event in Cockpit it is just showing (1) on Message Event and it is stuck there (No exceptions in log, no information, no incidents).
After this happens to one process, any other jobs on new process instances get stuck on the same step until i do the restart of Engine. When i restart the engine Camunda starts doing stucked processes and they eventually all come to end.
My idea was that the issue with Job Executor so I changed the configuration of Pool of Job Executor from min: 25 max 50 to min 10 max 20 and it seems that lowering the pool actually postponed the stucking from max of 1000 process instances running to 2500 process instances running so i decided to give default values to Job Executor 3 min and 10 max and now 5000 process instances are run without any stucking issue.
Does anyone know why the process is stuck if the Pool for Job Executor is higher.
I tried Camunda 7.10.0 and 7.11.0 with Java 11 (OpenJDK). Camunda is running on Spring Boot 2.1.5, 3.3.1 (Camunda Spring Boot Starter).
The CPU has 10 core and HT.
Update: I found on forum that the simular problem appeared when Http connector did not have timeout. We are using custom http delegate that prints out all request in log so there is no active request in log when this happens