Job Executor Stops Working - Catastrophic Failure

mppfor_manu · October 2, 2017, 10:45pm

We are running Camunda 7.6.2-ee on WildFly 10.1. Recently, our Camunda instances have “locked up” and will not execute any further jobs.

We have cancelled all running processes, but it will not add a new job until we restart WildFly. Other observations:

WildFly itself seems to be functioning properly
You cannot cancel processes through the cockpit GUI
The REST interface appears to be working as you can get lists and DELETE processes through it
You can start a process but the moment it hits and asynchronous boundary, it stops executing
System and database resources are more than adequate
A thread dump shows threads related to the Apache http-client in a WAITING state and the org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:102) is consistently present
After cancelling all processes through the REST API, the expected database tables are empty. If we truncate the ACT_RU_JOBDEF table, we still see no further job executions

In effect, Camunda is all but dead and we cannot figure out what is causing this. We do not know at this point in time what has changed (e.g. processes (new or updated)). What I’m looking for is help on how this could possibly occur. If all the processes are cancelled, how can Camunda not start new ones?

Thanks.

Michael

Webcyberrob · October 3, 2017, 2:15am

Hi Michael

This [1] thread may be of interest. In summary, it seems that if the connectors get blocked, the engine behavior becomes interesting.

Hence in terms of something changing, if you are calling remote services, verify connectivity and behaviour from your engine node to these remote services…

regards

Rob

[1] Job Executor hangs or stops acquiring jobs? (solved: HTTP-Connector stuck in endless job due to long http request/no response)

system · January 30, 2024, 10:59am