Custom batch job using camunda-batch

jangalinski · October 13, 2016, 12:58pm

We are generating DMN tables based on rules stored in our database,

What we need is:

over a period of 15min, collect change events.
every 15min execute a job that processes all stored events of the last 15 minutes.

We think that the batch mechanism would suit our needs, so all “change events” produce a batch job and every 15min we create a seed job that triggers processing of all not already processed events.

Questions:

is batch able to support this scenario?
how would we best trigger a recurrent batch run?
where can we find non-migration related documentation on batches?

Thanks in advance
Jan

Ingo_Richtsmeier · October 13, 2016, 1:39pm

Build a process with a timer start event, starting every 15 minutes.

See you later, Ingo

aakhmerov · October 13, 2016, 1:50pm

Hi @Jan,

documentation on Batch is here
https://docs.camunda.org/manual/7.5/user-guide/process-engine/batch/

Creation of a BatchEntity in general is basically generating seed job for you which upon execution generates jobs to handle actual logic. So basically you want to create a new BatchEntity every 15 minutes if I understand you correctly. To trigger it, I would assume timer event should be fine.

Please note that in 7.6 implementations of batch operations are getting extended and extra abstract classes are getting added. You can see commits under https://app.camunda.com/jira/browse/CAM-6823 to understand better implementation of custom batch operations.

Does that help you?
Askar

patrick.wunderlich · October 14, 2016, 9:10am

Hi,

I’m working with Jan at the some project.

Does I understand Batch Processing correct? Sence of it is to split up a hugh amount of workload into small jobs?
So I create a Batch which has a seed job. And this seed job creates a lot of batch jobs. E.g. to migrate a single process instance.

But I’m not sure if batch processing is really helpful for us. Just to describe our problem in detail:

We get a Change event for a spezial object (has nothing to do with camunda)
We wan’t to use the job queue to collect those events / jobs (also the event / job itself is not related to camunda), but we don’t want to execute them now. It would just be nice to store does in a queue which is already there.
Every 15 Minutes we wan’t to collect just those specific jobs, distinct them, because it could be that there are duplicates, and execute them. (We want to do change something in our own system, is also not camunda related)

So we going another way. We create the jobs before the batch / seed job does this … But however, we wan’t to collect those jobs and execute them with one single e.g. batch.

Is batch really the best way for this szenario? Or should we maybe just write an own job executer which collects just those jobs, distinct them and executes them?

Thanks and best regards,
Patrick

thorben · October 14, 2016, 9:28am

To me, it sounds like you don’t necessarily need the job table for collecting those events (and it might not be the optimal data store for that, depending on your use case). I think batch might be useful for processing the events, if you want to do that in individual transactions (in combination with a timer job as mentioned by Ingo).

That is currently not possible with the batch infrastructure, of course you could hack it in one way or the other. E.g. you could collect events by creating jobs that are suspended, then activate all of those that are to be processed every 15 minutes. No need for batch then.

aakhmerov · October 14, 2016, 9:29am

Hi @patrick.wunderlich,

I think your use case consists out of 3 separate tasks\chunks of work which are sot of independent.

collection and storage of the events\preparing the data that has to be processed. This is out of Job executor\Batch infrastructure, you can achieve this by using delegates or external tasks.
Cron based execution of processing previously collected events\data this might be done using timer events or any other cron scheduled bean. At this step you want to identify how many entites from your collected data you want to process and if each chunk processing should be done inside of the engine. So you grab all your entites\events from step 1 and either pass it to custom Batch implementation or yor custom executor. If you go with engine’s batch infrastructure then this is where you create a BatchEntity. If you decide not to use Batch then on this step you just create a bunch of jobs in some way.
Execution of processing on individual chunk. If you implement based on Batch infrastructure this is where individual jobs created out of seed are working.

The decision to use or not to use Batch infrastructure should be based on what you expect in terms of logging, monitoring, retries management, etc. Batch infrastructure will allow you to see and manage your execution from cockpit, prepare basics for chunks size definition, history etc. You can of course just implement all this on your own.

Does that help?
Askar.

jangalinski · October 14, 2016, 9:49am

Thanks Askar and Thorben for clarifying this … we will need to discuss.

patrick.wunderlich · October 14, 2016, 1:56pm

Thanks Torben and Askar, this helps a lot!