Anyway to disable auto retry when job is timeout?

TangJiong · November 22, 2021, 8:10am

Timeouts

If the job is not completed or failed within the configured job activation timeout, Zeebe reassigns the job to another job worker. This does not affect the number of remaining retries.

A timeout may lead to two different workers working on the same job, possibly at the same time. If this occurs, only one worker successfully completes the job. The other complete job command is rejected with a NOT FOUND error.

The fact that jobs may be worked on more than once means that Zeebe is an “at least once” system with respect to job delivery and that worker code must be idempotent. In other words, workers must deal with jobs in a way that allows the code to be executed more than once for the same job, all while preserving the expected application state.

As the docs shows, when a job is not completed or failed within the configured job activation timeout, it will be automatically retried again,which means our business logic may execute many times. In our cases, it’s hard to guarantee the idempotentence of all job.

So is there anyway to disable auto retry when job is timeout?

jonathan.lukas · November 22, 2021, 8:57am

Hello @TangJiong ,

first of all, you should measure how long your task will take at a max and adjust the timeout value of your request according to it.

If this value varies too much or is too high, you could implement a worker-side state that tracks the executed jobs. My assumption is here that your workers are scaleable. The new worker picking up a job would then be able to check if the task is still in progress.

The auto retry cannot be disabled. The reason is that a worker could crash during execution, leaving a task locked forever. The timeout prevents this.

I hope this helps

Jonathan

TangJiong · November 22, 2021, 9:39am

@jonathanlukas Thanks, I understand this.

Instead of auto retry, mark the timeout job as failed and raise some incident, leave the developer to make the decision. Is this a better way?

jonathan.lukas · November 22, 2021, 9:42am

Hello @TangJiong ,

you could achieve this by letting a task fail without retries explicitely in your worker code before the task times out. It is possible, but please also test this behaviour under load. It could lead to more admistrative overhead.

Jonathan

TangJiong · November 22, 2021, 12:17pm

@jonathanlukas Very grateful for your patience.

letting a task fail without retries explicitely in your worker code before the task times out

Seems help. But that means I need write the code in every Job Worker. I hope the broker can fail the job dirrectly when timeout or in a configurable way.

Maybe I can open an issue to see whether others meet the same case with me?

jonathan.lukas · November 22, 2021, 12:44pm

Hello @TangJiong ,

yes, this is an explicit implementation right now from your side.

You are welcome to open an issue on this if you want.

Jonathan

jwulf · December 5, 2021, 9:45pm

For these kinds of things, I would wrap the client with my own facade, and put common logic in the facade.

katarzyna · April 6, 2022, 10:39am

Hello,
Could you please tell me where can I adjust the timeout?

Regards,
Katarzyna

jonathan.lukas · April 6, 2022, 2:26pm

Hello @katarzyna ,

this can be done by setting the field timeout as described here.

Each client implementation should offer a field like this.

I hope this helps

Jonathan

fatguy96 · September 11, 2023, 4:07am

In my case, I start the task, but timout is depended on payTime, I let the task fail, but when I complete the task is throw the stateError