Process Stuck in Start

Hi @dscheinin,

I could reproduce your problem that the async continuation is not picked up after a server restart.

What I did:

  1. Modeled a process instance with Asynchronous After on the start event
  2. Deployed it with the rest api to the prepackaged shared process engine running on tomcat.
  3. Start a process instance which works fine.
  4. Restart the tomcat server
  5. Start another process instance. This get stucked in the job.

Then I diagnosed the problem as described by Thorben here: https://blog.camunda.com/post/2019/10/job-executor-what-is-going-on-in-my-process-engine/.

The job is created in the database with a deployment id of d7eeb0ec-7a5e-11ea-ac1d-3ce1a1c19785.

After adding debug output for the job acquisition, I found this snippet in the log:

09-Apr-2020 14:56:46.207 FEIN [Thread-5] org.camunda.commons.logging.BaseLogger.logDebug ENGINE-13005 Starting command -------------------- AcquireJobsCmd ----------------------
09-Apr-2020 14:56:46.207 FEIN [Thread-5] org.camunda.commons.logging.BaseLogger.logDebug ENGINE-13009 opening new command context
09-Apr-2020 14:56:46.209 FEIN [Thread-5] org.apache.ibatis.logging.jdbc.BaseJdbcLogger.debug ==>  Preparing: select RES.ID_, RES.REV_, RES.DUEDATE_, RES.PROCESS_INSTANCE_ID_, RES.EXCLUSIVE_ from ACT_RU_JOB RES where (RES.RETRIES_ > 0) and ( RES.DUEDATE_ is null or RES.DUEDATE_ <= ? ) and (RES.LOCK_OWNER_ is null or RES.LOCK_EXP_TIME_ < ?) and RES.SUSPENSION_STATE_ = 1 and (RES.DEPLOYMENT_ID_ is null or ( RES.DEPLOYMENT_ID_ IN ( ? , ? ) ) ) and ( ( RES.EXCLUSIVE_ = 1 and not exists( select J2.ID_ from ACT_RU_JOB J2 where J2.PROCESS_INSTANCE_ID_ = RES.PROCESS_INSTANCE_ID_ -- from the same proc. inst. and (J2.EXCLUSIVE_ = 1) -- also exclusive and (J2.LOCK_OWNER_ is not null and J2.LOCK_EXP_TIME_ >= ?) -- in progress ) ) or RES.EXCLUSIVE_ = 0 ) LIMIT ? OFFSET ? 
09-Apr-2020 14:56:46.210 FEIN [Thread-5] org.apache.ibatis.logging.jdbc.BaseJdbcLogger.debug ==> Parameters: 2020-04-09 14:56:46.207(Timestamp), 2020-04-09 14:56:46.207(Timestamp), b2ae552f-6ad1-11ea-8375-3ce1a1c19785(String), b2e62e19-6ad1-11ea-8375-3ce1a1c19785(String), 2020-04-09 14:56:46.207(Timestamp), 3(Integer), 0(Integer)
09-Apr-2020 14:56:46.211 FEIN [Thread-5] org.apache.ibatis.logging.jdbc.BaseJdbcLogger.debug <==      Total: 0
09-Apr-2020 14:56:46.211 FEIN [Thread-5] org.camunda.commons.logging.BaseLogger.logDebug ENGINE-13011 closing existing command context
09-Apr-2020 14:56:46.212 FEIN [Thread-5] org.camunda.commons.logging.BaseLogger.logDebug ENGINE-13006 Finishing command -------------------- AcquireJobsCmd ----------------------

The crucial part of the query is and (RES.DEPLOYMENT_ID_ is null or ( RES.DEPLOYMENT_ID_ IN ( ? , ? ) ) ) with the parameters b2ae552f-6ad1-11ea-8375-3ce1a1c19785 and b2e62e19-6ad1-11ea-8375-3ce1a1c19785 which didn’t match the original one from above.

To overcome this issue, you could either use a different deployment model and deploy new process models with redeploying a process application as a war file: https://docs.camunda.org/get-started/java-process-app/

Or change the setting in the bpm-platform.xml for the job executor:

<property name="jobExecutorDeploymentAware">false</property>

But be aware that this has other implications on heterogenous cluster setup and process applications.

Hope this helps, Ingo

2 Likes