Completing a failed job by using the Zeebe Client API

ntheodoropoulos · October 5, 2023, 1:26pm

Hi,

We would like to be able to complete a failed job in Camunda8 by using the zeebe client api through an external source

When we try to complete a job in status “failed” through zeebeClient.newCompleteCommand(key).variables(variables).send().join();
an error is thrown indicating that the job is in “FAILED” state
When we try to first resolve the failure and then complete the job, the issue is that the worker tries

zeebeClient.newResolveIncidentCommand(incidentKey).send().join();
zeebeClient.newCompleteCommand(key).variables(variables).send().join();

simultaneously to execute the job, and an error is thrown in the job worker code

When we try to set the number of retries to zero, an exception is thrown indicating that the number of retries cannot be set to zero
zeebeClient.newUpdateRetriesCommand(key).retries(0).send().join();

Is there any other way we can accomplish that, besides updating the variables and checking the status in the worker code?

rohwerj · October 6, 2023, 10:55am

I don’t completely understand your scenario.
An incident will only be raised if there are no retries left for a failed job. So in this case just calling the resolve incident command should not increase the retries and no job worker should be able to get the job.

Just for reference you could modify you process instance via the corresponding command, but this would mean you have to terminate the token at your failed activity and start a new one at the next activity. This could lead to errors, if the model is changed and you forget to also change this logic.

ntheodoropoulos · October 6, 2023, 11:30am

Hi @rohwerj ,

Thanks for the feedback.
However, I notice the following behavior.
Even if no retries are left and an incident has been raised, when submittting zeebeClient.newResolveIncidentCommand(incidentKey).send().join(); a job worker gets the job.
It seems that the newResolveIncidentCommand , internally makes again the job eligible for execution.

aravindhrs · October 9, 2023, 2:02pm

@ntheodoropoulos It’s the common behavior across camunda platform, that once incident is resolved then job is available for execution again.