Is there any way to retry all failed incidents in one go?

Hi,

I have around 20K orders failed and incident got logged. When I am trying to retry it is getting complete.
However, As its 20K order volumn it would be very difficult to open each and evry process and hit incident–>retry.

Is there any way (any script, sql query) with which I can hit all retries in one go?

I am using Camunda DB 7.5.0.

Please guide me.

Regards,
Amol

1 Like

The most obvious way I can think of would be to poll the REST API for “failed” processes and then issue a restart via that same REST API. I’m not sure exactly what you are doing in, I assume, the Camunda admin GUI (cockpit), but generally most of what you can do there has an equivalent method in the REST API.

What I would do is write a shell script (I work primarily in Linux) that extracts a list of incidences (I don’t work with incidences, by the way). In fact, you might start with a loop that queries for incident count based upon certain criteria. When the number is greater than 0, you know you have a failed incident and can go to another function that gets a list of failed incidences.

Once you have the list, you would iterate through it, restarting each one. Without knowing how you “restart” the incident, I can’t be more specific. Each loop would issue a REST request to restart a particular incident. The assumes you can actually do something like this with the REST API. It may be that you have to use the Java API, or it may be that the only way you can do this is through the Camunda Cockpit GUI.

If there are no external programmatic ways to do this, then you are going to have to implement more sophisticated exception handling. I don’t know anything about your processes, but if you’re using Java classes, you might be able to do it there. Otherwise, your BPMN is going to have to detect failures, wait, and then “restart” themselves. I’ve never done anything like this, but ultimately the way to deal with this is proper exception handling so that you don’t need to use an external process like the one I described above.

A “last ditch” suggestion, and I doubt Camunda would directly support this, would be to find out what piece of stored information in the database is being changed when you restart the incident. Then, and do this with extreme caution, directly modify that value in the database. I don’t know if that’s even possible, but most of what Camunda does is maintained in the database.

Good luck.

Hi Amol,

This is where the enterprise edition value adds. There is the batch operation feature [1] which could help here.

If you dont have access to the enterprise edition, you could write your own administrative process and deploy it to the engine. The tasks in this process could be: Query for failed tasks->confirm restart->restart. Hence service tasks in this process could use the java or REST API…

I would not advocate changing the DB data direct. Nor would I advocate external shell scripts. If you have a process engine, why not also use it for administrative processes…

regards

Rob

[1] https://docs.camunda.org/manual/7.6/webapps/cockpit/batch/batch-operation/#definition-of-operation

2 Likes

Hi Rob,

I am having enterprise edition but version 7.4 where as the link you have shared is for 7.6.

Please suggest for 7.4.

Regards,
Amol

Hi,

Sure, you can change the version of the documentation online. Here is a link [1] to the bulk retry implemented in 7.4 EE

regards

Rob

[1] https://docs.camunda.org/manual/7.4/webapps/cockpit/bpmn/failed-jobs/

1 Like