Problem with job executor

Hi,
after running for a while without problems, the process engine throws me this exception

“ENGINE-03005 Execution of ‘DELETE MessageEntity[960029c7-e522-11e5-959e-005056800def]’ failed. Entity was updated by another transaction concurrently.”

while it tried to execute a job. I did not explicitly received it in cockpit but through act_ru_job table. Can someone explain it to me? And how can i avoid it? In my opinion, it has something to do with the job executor.

Another issue with job executor is how he picks up job to process. I observed table Act_ru_job for a while and i could see that some jobs still have the column RETRIES_= 1, it means it can be executed. Their column LOCK_EXP_TIMES is continuously updated but nothing happened. Can someone also explain this issue for me?

Thanks you all in advance.

Best Regard,

Nam

The question when a job is packed up can depend on different factors. This section in the documentation may help you https://docs.camunda.org/manual/7.4/user-guide/process-engine/the-job-executor/#acquirable-jobs

Regarding the “Entity was updated by another transaction concurrently”: this is expected behavior if different transactions work on the same entities concurrently (ie. attept to update / delete the same row concurrently).

Hi Daniel,

thanks for your reply. I will have a look at the job-executor again.

Cheers,

Nam

Hi,

I’m also seeing what I think is a similar issue. Several of my process instances seem be stuck, and repeatedly start, but never finish. I see entries in the act_ru_job table, and the LOCK_EXP_TIME_ keeps getting updated, but the process never seems to make it to the end. My process has an async continuation before the start event.

Here’s an example message from the act_ru_job table:

ENGINE-03005 Execution of ‘DELETE MessageEntity[d8b9c391-0b79-11e6-bc00-06c8ae72cfc3]’ failed. Entity was updated by another transaction concurrently.

It just constantly cycles, and the REV_ keeps going up. It’s at 77 right now… It seems to increase every minute or so.

The RETRIES_ value always seems to stay at 3.

This problem seems to occur after running many processes through. Things are fine for a while before it seems to get stuck… I’m on 7.4.

Thanks,
Galen

Hi Galen,

like Daniel already said and from my experiences with camunda, it is an expected bahavior in a multiple thread environment. It is going to be fine when you retry these failed job with this kind of exception. I think you should have a look at your delegated codes in your process definition because it may also cause these problems.

Cheers,

Nam

Hi Galen,

Does the jobs consists of long running services? Or is it across the board when it happens?

Cheers,
Christian

The jobs do have services that take a little over 2 minutes to complete. However, this is well under the 5 minutes lock expiration, so I don’t think the lock expiration is kicking in. From my logs, it appears that the process starts with one job executor, and runs all the way to the end. However, I have an ExecutionListener on my application, that notifies me in the logs when the endEvent is reached. For example:

...
  else if (elementName.equals("EndEventImpl")) {  					
...
		if (execution.getEventName().equals("end")) {
... 
LOG MESSAGE HERE THAT END EVENT WAS REACHED
..
}

This message never gets spit out in my logs, and instead I see another job executor (on another worker machine) start the process again from the last continuation point. The only continuation point in my process is at the start event, where I have an async before on the start event. So basically, the process gets started over and over again by different job executors.

The "ENGINE-03005 " exception seems to happen at the end of the process, right before it’s started again by a new worker.

Thanks,
Galen

Could you please share your bpmn model?

Here is my model. The green boxes are custom tasks.

process.bpmn (12.6 KB)

Did you try to set ‘asynchronous before’ on the joining parallel gateway?

No. I initially had the async before, but I removed it. It seemed unnecessary since the tasks under the gateway are exclusive, and it would also add another hit (unnecessary) to the database. I would have to re-introduce that and go back and test to see if the same issue occurs. In general, do you see an issue with me not having the async set?

Thanks,
Galen

Hi @hawky4s, I also facing the same problem… Like you said I’ve set async before on joining gateway of parallel tasks and everything is working fine. Thank you!