I’m currently dealing with a significant challenge in our Camunda setup. We have around 128,000 failed jobs that need to be retried. I am looking for advice on the most efficient and reliable way to handle this situation.
Hello my friend!
I see that you really had a big problem, huh!
I believe that there is not much “magic” that we can do in this case, unfortunately…
I think that the best thing to do is to create a script to search for failed jobs, and execute them in batches, little by little, with a quantity of executions and timeout appropriate for the power of your application, and wait until this script executes everything.
I suggest that this be done at a time when your application has few executions in progress, since retry uses job executors for execution.
William Robert Alves
Would it be feasible to solve this issue using a SQL script? For example, we could use a query that resets the retry count for failed jobs from 0 to 1. Here’s a possible script:
WITH cte AS (
SELECT job.id_
FROM act_ru_job job
WHERE job.retries_ = 0
AND job.lock_owner_ IS NULL
AND job.process_def_id_ = 'RefundRetryProcess:7:3d65d061-9d93-11ee-b00f-d6037865158d'
LIMIT 1000
)
UPDATE act_ru_job job
SET retries_ = 1,
duedate_ = NOW()
FROM cte
WHERE job.id_ = cte.id_;
Hi @pertsh.galstyan
I would avoid any direct intervention in the database and suggest you to take a look at this REST API: Camunda Automation Platform 7.20.7 REST API. to set retries.