OptimisticLockingException trying to complete an External Task

Sometimes, when two external tasks are executed in parallel (governed by a ParallelGateway, so it’s not two tasks of the same task definition from different process instances, but two completely different tasks from the same process instance), one of them throws exception on ExternalTaskService#complete:

org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'UPDATE VariableInstanceEntity[4606]' failed. Entity was updated by another transaction concurrently.
	at org.camunda.bpm.engine.impl.db.EnginePersistenceLogger.concurrentUpdateDbEntityException(EnginePersistenceLogger.java:135)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.handleOptimisticLockingException(DbEntityManager.java:499)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.checkFlushResults(DbEntityManager.java:451)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperations(DbEntityManager.java:367)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperationManager(DbEntityManager.java:325)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flush(DbEntityManager.java:297)
	at org.camunda.bpm.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:208)
	at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:137)
	at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:116)
	at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:70)
	at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:33)
	at org.camunda.bpm.engine.impl.ExternalTaskServiceImpl.complete(ExternalTaskServiceImpl.java:56)
	at org.camunda.bpm.engine.impl.ExternalTaskServiceImpl.complete(ExternalTaskServiceImpl.java:52)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.jboss.weld.bean.proxy.AbstractBeanInstance.invoke(AbstractBeanInstance.java:38)
	at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:106)
	at org.camunda.bpm.engine.ExternalTaskService$632439471$Proxy$_$$_WeldClientProxy.complete(Unknown Source)
	at com.acme.ExternalTaskWorker
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:835)

It doesn’t occur all the time, just sometimes, and the task in question must be fetched for work again, while the business logic implemented by the task worker effectively executes twice. Is there something to be done to mitigate this behaviour in order to avoid repetetive task execution? Is catching such exceptions and retrying to re-complete the task something recommended or correct to do?

Hi @yzt,

have you tried to set the max number of tasks a client can work on to 1?

ExternalTaskClient client = ExternalTaskClient.create().baseUrl(…)
.asyncResponseTimeout(…).maxTasks(1).build();

I also had some problems with using multiple ExternalTask clients. This configuration fixed it for me.

And make sure, that the lockTime is long enough to complete the task.

Hope that helps.

Regards
Michael

I don’t use REST API, in my case it’s embedded engine and explicit ExternalTaskService#fetchAndLock calls. But yes, i already fetch just 1 task per worker.

ExternalTaskService#complete unlocks the task anyway, however lockTime is set to a sufficient value as well, way larger than processing time.

Just to follow up: i tried implementing an internal compensation mechanism to retry completing the task again after a delay when complete fails , and it seems to work.

It can’t be accepted as production-ready solution without further investigation however. Waiting for an answer from you guys.