Handling errors from External Task Resource Complete API

I’m trying to implement a recovery procedure when I receive an unsuccessful response from a Camunda7 External Task complete request. The document link is below. Most times the api call works, but occasionally I been getting 502/504 failures. I can understand the cause of the failures at the source, but my challenge how to recover.
When I receive an error, I would normally do a retry of the same API call. However, for camunda, it seems to trigger duplicate processing, as if I wanted to start two threads from one. I’m looking for the correct pattern of api call(s) to ensure the ‘complete’ task is accepted by Camunda, and only completed once. looking for recommendation(s)

Current code does not attempt any retry, and is causing odd behaviors… input is appreciated. I’ve noted (in the past) I can call the complete multiple times without error. which doesn’t make sense…

regards,
Rob

Camunda Automation Platform 7.24.0-SNAPSHOT REST API

Check if this link has what you’re after in terms of pattern.

thanks Claudio: The document makes sense in normal scenarios. but doesn’t cover when a "complete’ fails. From the article (below) it shows a complete being called with no error handling. I’m seeing errors being returned from the call (502 & 503 so far). This is my situation… what do i do now? retry the complete (which seems to cause duplications of steps at times, ) or some how query the task to see if it’s complete? do you know what might indicate a taks is complete??

if(success) {
      externalTaskService.complete(task.getId(), variables);
    }
    else {
      // if the work was not successful, mark it as failed
      externalTaskService.handleFailure(
        task.getId(),
        "externalWorkerId",
        "Address could not be validated: Address database not reachable",
        1, 10L * 60L * 1000L);
    }
  }
  catch(Exception e) {
  

No problem.
Can you elaborate on “duplications of steps”, please?
You mean the External Task worker is called twice by the engine or the engine executes the Task again?

For your reference, this is what External Task complete does. Not much.

  public void complete(Map<String, Object> variables, Map<String, Object> localVariables) {
    ensureActive();

    ExecutionEntity associatedExecution = getExecution();

    ensureVariablesSet(associatedExecution, variables, localVariables);

    if(evaluateThrowBpmnError(associatedExecution, false)) {
      return;
    }

    deleteFromExecutionAndRuntimeTable(true);

    produceHistoricExternalTaskSuccessfulEvent();

    associatedExecution.signal(null, null);
  }

Here is a parallel with Camunda 8 (I believe the behavior is somewhat similar):

  • When a the engine reaches a Service Task, is places a job (say, type = “Foo”) in a queue to be processed by the respective worker and waits for the task to complete or fail.

  • The worker for type “Foo” frequently polls the queue looking for jobs of that type available for processing. It then picks the job that was posted.

  • Once the the job worker picks up the job, it has a specific time to call complete on that job. If the times expires, the engine assumes the worker died or something and frees the job to be picked up again on the next poll.

The result of this is that if the call to complete is failing the job will not be considered completed and when the worker times out the job will be picked up for processing again even though all the work has been done.

This makes sense in the context of the product’s architecture - they call it “at-least-once”

The implication of this is that workers have be implemented having in mind that they can be called more than once even if it succeeded previously. Camunda recommends workers to be implemented as idempotent modules.

If the failure on the call to complete is transient and your worker is idempotent, you probably don’t need to do anything (apart from logging the error for tracking purposes). The next time the worker picks up the same job, it will effect nothing (due to its idempotence) and will call complete again and it should just work from there.

Thanks Claudio:
The external Task is a Robotic Process Automatoin (RPA). Currently, when the RPA completes its work, a complete task is invoked, and is failing occasionally. I tried to implement a retry on failed to complete the RPA successfully. Not knowing Camunda well enough at the time, the coders of Camunda, starting to complain that token was being processed twice on failures. This didn’t make sense to me, but was pressured to remove the retry logic from the RPA. So the current scenario attempt to complete “once” and logs any failures. The Camunda flow has a ‘Timeout’ exception, but the claim “now” is that the process continues sometimes, and others not… very confusing. Would like to bullet-proof the RPA to avoid being scape-goated as the failure point. Which it isn’t…

Yeah, the duplicated token part is confusing and indeed does not make sense. My expectation is that Camunda will retry executing the worker job if it is not completed after a certain period but this should not affect the token. I wonder if they are considering the Camunda’s retry as another token being generated because the worker is executed again.

If the issue with complete is transient, I can totally see the behavior they are experiencing:

1 - Camunda’s token reaches the External Task. Token stays there.
2 - Camunda invokes the External Task and waits for completion or failure
3 - The worker does the job but the call to complete fails
4 - The time for the worker to report completion or failure expires. Camunda sends another job request to the worker. (token is still there waiting)
5 - Worker does the job (again) and calls complete, which works this time.
6 - Camunda consider the Task successful and moves the token forward to the next step.

If the worker is not idempotent, Camunda calling the second time is potentially dangerous. If this is the case, you could try to increase the timeout for the worker (hence giving it more time for it to report the completion or failure) and you would be able to retry the “complete” failed call before Camunda tries to call the worker again, but this is something you have to test.

Apparently you have access to the underlying http client and can tweak some timing parameters:

Hi @flemster,

Could you please share the catch block’s content?

Do you have multiple instances of the same worker running? If yes, ensure that each has a unique worker-id

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.