BPMN Process Slows To a Crawl

We have a process that maps an invoice in one form to another. The invoice may contains thousands (or tens of thousands) of line items, and each line item has many related objects. The line items are processed in a sequential multi-instance sub process, which creates new objects using Groovy script tasks and stores the objects as the result of the task. DMN business rule tasks hold the mapping rules, which Groovy scripts apply to the newly created objects.

The process works great until we get over a few thousand line items, at what point it begins to slow down considerably. At the beginning the process maps at a rate of about 100 lines per second. After around 4.5k line items it is down to about one line item per second.

In testing, it seems that the more complicated the graph passed in, the faster the process slows down. For instance, I have passed an invoice with 12k line items, but without many nested objects for each line item, which does not begin to slow down until near 10k, and never slows down to the rate of 1 line per second, like our real-world scenario.

I am trying to figure out what the factors in this could be. There are no wait states in the process (everything is a groovy script task or business rule task implemented as a DMN). The process variable passed to the process (the large invoice we map from) is created as a transient typed variable to avoid any possibility of persistence.

I an my team have been searching the docs and forums, but haven’t found any information that is directly related to our issue. If anyone has suggestions, please share.

Thank you.

This post is related to Avoid Persisting Process Variables.

This may have many reasons. You have to find out where the time is spent. Find a profiling tool or insert some log statements.

We have profiled it and ruled out anything other than Camunda. At this point we are investigating other options for our process with less processing in Camunda, and that seems to be the solution.

Yes, that was also my thought. The strength of camunda is in orchestrating and controlling, not in the “batch” processing.

But nevertheless: If you find out what caused the delay I’d be very interested to know!

One more question: Where in the process model do you have “Async Before” or “Async After” set?

What changes have you tried making to the process engine settings?

Currently don’t have any async before or after.

We have history mode and job execution are off.

This might be the reason. You have one huge ongoing DB transaction.

True - this does mean that you’re limiting yourself to incoming threads and the engine isn’t actually doing anything. also if you plan on adding async before or after you should make sure to turn back on the job executor.

Why did you turn it off btw?

This process has no wait states, and we are intentionally avoiding any database activity using transient variables. The whole purpose of this process is to map a transient object to another transient object, which downstream processes persist to a different data store.

Our practice is to turn off the job executor for all engines unless the module may host processes that include wait states.

Interesting. The intent is to avoid any database activity by using transient variables and avoiding wait states. Profiling the database shows no read or write activity during the process. Ideally, this would not involve the Camunda database at all. We simply want to use it for mapping; there is no state to save, and if the process fails, that is fine as other processes handle the error condition. I have a separate thread on avoiding any database activity.

I believe I tried setting async before and after, but it had no affect. I’ll test that again.

Enabling job execution and adding async did not have any impact on the progressive slow down.

The solution we arrived at has been to reengineer the process so that less is being done in the BPMN/DMN, and instead of sending the entire invoice to the process it is broken up so that we can send line items individually. With that, the performance has greatly improved.

I have also found that with a much smaller dataset it has a positive impact turning all multi-instance subprocesses into parallel multi instance subprocesses. However, enabling the job execution results in an immediate flood of ProcessEngineException due to the engine unable to resolve identifiers. We have left that off and are happy with the results so far.