Multi-instance External Tasks are getting stuck

gohar.gasparyan · March 13, 2018, 10:20am

Hi, I am struggling with Multi-instance External Tasks. I created a simplest bpmn to show the problem.

The Task is External, Loop Cardinality is fixed number (e.g. 100), and has

Multi Instance Async Before - true
Multi Instance Async After - true
Multi Instance Exclusive - true
Async Before - true
Async After - true
Exclusive - true

Then I have a External Task Poller, which is for sake of simplicity polling tasks synchronously in a for loop, logging and completing.

@Scheduled(
        fixedDelayString = "${externaltask.worker.poll.rate}"
)
public void poll() {
    List<LockedExternalTask> tasks = externalTaskService.fetchAndLock(10, externalTaskConfiguration.getWorkerId())
            .topic(topic.getName(), externalTaskConfiguration.getDefaultLockDuration())
            .execute();

    tasks.forEach(task ->  {
                try {
                    log.info("-----------------------------------Executing task: {}", task.getId());
                } catch (Exception e) {
                    log.error("failed to process external task - {}", task.getId(), e);
                }
                externalTaskService.complete(task.getId(), task.getWorkerId(), task.getVariables());
                System.out.println("------------------------------------------------completed: " + task.getId());
            }
    );
}

I am getting following exception

org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'UPDATE VariableInstanceEntity[99e059c5-26a0-11e8-b347-aa5d7001bc63]' failed. Entity was updated by another transaction concurrently.
at org.camunda.bpm.engine.impl.db.EnginePersistenceLogger.concurrentUpdateDbEntityException(EnginePersistenceLogger.java:130)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.handleOptimisticLockingException(DbEntityManager.java:406)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.checkFlushResults(DbEntityManager.java:365)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperations(DbEntityManager.java:345)

In Camunda Cockpit I can see that nrOfActiveInstances are not 0 after quite long time, even though all tasks were polled and completed.

nrOfInstances - 100
nrOfCompletedInstances - 64
nrOfActiveInstances - 36

And the process would stuck in this state forever…

I have also tried handleFailure method in ExternalTaskService, the situation didn’t chnage much.

Any help, hint, suggestion will be much appreciated. Thanks in advance.

Philipp_Ossler · March 19, 2018, 9:18am

Hi @gohar.gasparyan,

I tried to reproduce the issue in a unit tests but it works as expected.

Do you know which variable is modified concurrently?
How long do you look an external task?
Do you see errors when calling complete(...)?

Best regards,
Philipp

mase · March 22, 2018, 8:37am

I’d assume that

externalTaskService.complete(task.getId(), task.getWorkerId(), task.getVariables());

can lead to the optimistic locking exception.

If external-tasks A reports back e.g.
“variables”:
{“aVariable”: {“value”: “aStringValue”} }
and external-task B reports back:
“variables”:
{“aVariable”: {“value”: “aDifferentStringValue”} }

can this lead to optimistic locking exceptions since Camunda tries to merge the results for the “owning”-process?

So is it possible/advisable at all to return variables with the same name at the complete-method-call?

Philipp_Ossler · March 26, 2018, 9:38am

Hi @mase,

an OLE can always happen when a process instance is updated by multiple transactions. So if you call complete(...) concurrently then an OLE may occur.

If the variable is local (in the scope of the execution) then it’s ok. Otherwise, you would override the variable with the new call which is maybe not what you want.

Does this help you?

Best regards,
Philipp

hans · April 29, 2018, 8:03am

We had the same issue (multi-instance subprocess stuck) and found that the cause was that we were overwriting the loop control variables. Instead of calling complete with task.getVariables() as argument, try providing only those variables that you want to change.

-Hans