Completing External Tasks In Bulk

rvinzent · May 18, 2021, 9:20pm

Hello, I have a use case where I would like to complete several hundred external tasks in bulk.

We will have multiple processes running in parallel (on the scale of thousands), and the current plan is to have these process reach an External Task that will place them into some queue (topic). A user will poll the task queue and look for ready-to-go items, then select a subset from the list of available items, and complete a batch of them all at once.

I can’t find an API that allows completing tasks in bulk, and completing 1000+ external tasks via API call is not viable. Is there a better BPMN construct for this? The External Task seemed appropriate because we could provide a topic in the definition, and easily query for tasks on that topic. Also the functionality around locking, and failing external tasks would be very convenient.

This feels natural but doesn’t allow for completing the tasks in bulk, in the same transaction.

Tasks are placed into SomeQueueName, and the functionality of External Tasks is almost exactly what we need.

The User and Manual Task implementations seem to suffer from this same issue where they cannot be completed in bulk, and the only construct I can find for advancing multiple process instances in bulk seems to be signals (not appropriate for this use case) or message receiving events.

If using a message implementation, how can the can we query for items in the queue? There seems to be a process engine API for querying for message subscriptions, which could possibly work, but I can’t find a REST API for this sort of query. Ideally we could be able to query the queue without abusing other fields on the task like the definition key, name, or description.

Here we have a couple of hacks:

setting the message name to something specific to both the message type and the name of the fake topic
correlating a message to multiple processes is possible, but it seems to be a place where Camunda breaks with the intended usage of the BPMN element where a message should correlate to exactly one process instance

We could also set the name of the fake topic in the name or taskDefinitionId fields, but these also feel like hacks. The name is a display field, it’s not ideal to put process configuration properties in the name. Putting the name inside the taskDefinitionId also feels like a similar hack. I can’t think of a clean out-of-the box mechanism to allow us to query the queue without abusing these fields.

It would be a huge detriment to our work if we were forced to manage this sort of queue in an external system to basically replicate the functionality of the external task queue plus bulk completion.

Am I missing something obvious here?

Niall · May 19, 2021, 7:55am

I think your best bet may be to take a look at @jangalinski 's Custom Batch plugin.

I’m pretty sure this should let you use the internal batching systems of Camunda for custom requests and so could be used for completing x number of External tasks in one call and have the engine deal with completing them.

jangalinski · May 19, 2021, 8:04am

I did minor contributions to that plugin, @pschalk did most of the work

Noordsestern · May 19, 2021, 5:54pm

Does the plugin work with external task pattern and with Camunda PaaS or is it for java developer a only?

I think, there is an inconsistency in the API: External task.FetchAndLock endpoint has a parameter maxTask where I can retrieve X tasks of a topic.
But the complete endpoint only allows to be executed for each single task individually.

jangalinski · May 20, 2021, 6:00am

The plug-in allows to register and implement batch job handlers that are executed via job executer.
What your handlers do is up to you, of course they could also work with external tasks

rvinzent · May 28, 2021, 4:50pm

Sorry, but I don’t really understand how I would integrate this plugin with my use case. The example usage explains running migration jobs as a batch, each in a separate transaction, which is the opposite of what I want.

From the usage guide:

Advantages:

decoupling of execution, i.e., every batch execution job uses its own transaction

Disadvantages:

a batch can fail partially while a subset was already executed, e.g., some process instances were migrated where others failed

This scenario is less than ideal for the business; either or none of the tasks should complete, so bundling them all in the same transaction would be an advantage for us.

In my case I have several hundred external tasks sitting in a queue. I would like to be able to complete a select group of specific tasks in the same REST call, and ideally in the same transaction.

I think, there is an inconsistency in the API: External task.FetchAndLock endpoint has a parameter maxTask where I can retrieve X tasks of a topic.
But the complete endpoint only allows to be executed for each single task individually.

This is closer to the feature I’m looking for. I can fetch a bunch of tasks from the queue, and I can lock them in bulk, but they can only be completed one at a time. Sending hundreds or even thousands of API calls to complete tasks in a loop will not perform well, and it’s important for my use case that all the tasks in the batch are consistently completed using some kind of “all or none” mechanism.

It would be detrimental to the business logic if some tasks were able to complete successfully and others are left behind due to errors. If any of the tasks error, all of them should fail to complete. Are there any known patterns for this use case?

As explained in the original post I think a “receive message event” can technically work here, but external tasks are both more semantically correct and provide a clean mechanism to fetch tasks from a certain queue. It doesn’t appear to be possible to query for message subscriptions using the out of the box REST API.