Problem with job executor

namnguyen · March 14, 2016, 9:45am

Hi,
after running for a while without problems, the process engine throws me this exception

“ENGINE-03005 Execution of ‘DELETE MessageEntity[960029c7-e522-11e5-959e-005056800def]’ failed. Entity was updated by another transaction concurrently.”

while it tried to execute a job. I did not explicitly received it in cockpit but through act_ru_job table. Can someone explain it to me? And how can i avoid it? In my opinion, it has something to do with the job executor.

Another issue with job executor is how he picks up job to process. I observed table Act_ru_job for a while and i could see that some jobs still have the column RETRIES_= 1, it means it can be executed. Their column LOCK_EXP_TIMES is continuously updated but nothing happened. Can someone also explain this issue for me?

Thanks you all in advance.

Best Regard,

Nam

DanielMeyer · March 14, 2016, 11:56am

The question when a job is packed up can depend on different factors. This section in the documentation may help you https://docs.camunda.org/manual/7.4/user-guide/process-engine/the-job-executor/#acquirable-jobs

Regarding the “Entity was updated by another transaction concurrently”: this is expected behavior if different transactions work on the same entities concurrently (ie. attept to update / delete the same row concurrently).

namnguyen · March 18, 2016, 8:23am

Hi Daniel,

thanks for your reply. I will have a look at the job-executor again.

Cheers,

Nam

Galen_Hollins · April 26, 2016, 7:30am

Hi,

I’m also seeing what I think is a similar issue. Several of my process instances seem be stuck, and repeatedly start, but never finish. I see entries in the act_ru_job table, and the LOCK_EXP_TIME_ keeps getting updated, but the process never seems to make it to the end. My process has an async continuation before the start event.

Here’s an example message from the act_ru_job table:

ENGINE-03005 Execution of ‘DELETE MessageEntity[d8b9c391-0b79-11e6-bc00-06c8ae72cfc3]’ failed. Entity was updated by another transaction concurrently.

It just constantly cycles, and the REV_ keeps going up. It’s at 77 right now… It seems to increase every minute or so.

The RETRIES_ value always seems to stay at 3.

This problem seems to occur after running many processes through. Things are fine for a while before it seems to get stuck… I’m on 7.4.

Thanks,
Galen

namnguyen · April 27, 2016, 9:47am

Hi Galen,

like Daniel already said and from my experiences with camunda, it is an expected bahavior in a multiple thread environment. It is going to be fine when you retry these failed job with this kind of exception. I think you should have a look at your delegated codes in your process definition because it may also cause these problems.

Cheers,

Nam

hawky4s · April 27, 2016, 11:21am

Hi Galen,

Does the jobs consists of long running services? Or is it across the board when it happens?

Cheers,
Christian

Galen_Hollins · April 27, 2016, 4:21pm

The jobs do have services that take a little over 2 minutes to complete. However, this is well under the 5 minutes lock expiration, so I don’t think the lock expiration is kicking in. From my logs, it appears that the process starts with one job executor, and runs all the way to the end. However, I have an ExecutionListener on my application, that notifies me in the logs when the endEvent is reached. For example:

...
  else if (elementName.equals("EndEventImpl")) {  					
...
		if (execution.getEventName().equals("end")) {
... 
LOG MESSAGE HERE THAT END EVENT WAS REACHED
..
}

This message never gets spit out in my logs, and instead I see another job executor (on another worker machine) start the process again from the last continuation point. The only continuation point in my process is at the start event, where I have an async before on the start event. So basically, the process gets started over and over again by different job executors.

The "ENGINE-03005 " exception seems to happen at the end of the process, right before it’s started again by a new worker.

Thanks,
Galen

hawky4s · April 27, 2016, 4:41pm

Could you please share your bpmn model?

Galen_Hollins · April 28, 2016, 5:10am

Here is my model. The green boxes are custom tasks.

process.bpmn (12.6 KB)

hawky4s · April 28, 2016, 12:42pm

Did you try to set ‘asynchronous before’ on the joining parallel gateway?

Galen_Hollins · April 28, 2016, 8:38pm

No. I initially had the async before, but I removed it. It seemed unnecessary since the tasks under the gateway are exclusive, and it would also add another hit (unnecessary) to the database. I would have to re-introduce that and go back and test to see if the same issue occurs. In general, do you see an issue with me not having the async set?

Thanks,
Galen

heril.muratovic06 · June 19, 2018, 8:15am

Hi @hawky4s, I also facing the same problem… Like you said I’ve set async before on joining gateway of parallel tasks and everything is working fine. Thank you!