ACT_RU_TASK.PARENT_TASK_ID_ missing index?

dikey94 · May 19, 2017, 1:01pm

I’m using Camunda 7.6 with Oracle DB and I’d like to know why there is no index on ACT_RU_TASK.PARENT_TASK_ID_?

aakhmerov · May 19, 2017, 2:40pm

Hi @dikey94,

is that a rhetorical question? Why do you think that it should exist?

Cheers,
Askar

dikey94 · May 23, 2017, 7:10am

I was expecting that kind of answer

I have 2 scenarios.

Without the index
a) Running and completing 10k process / empty DB / no process running in the background / 300-1000ms delay between each request - avg response time 95ms
then
b) Only running 500k process / no delay between requests - avg response time 1259ms
then
c) Running and completing 10k process / 500k process running in the background / 300-1000ms delay between each request - avg response time 8000ms
With index created
Exactly the same tests, times are better:
a) avg response time 84ms
b) avg response time 1248ms
c) avg response time 104ms

Also CPU Consumption is smaller.

aakhmerov · May 23, 2017, 7:45am

Hi @dikey94,

that makes sense. If you would like, you can always create a ticket in our JIRA and even provide a pull request with any improvements. We are more then happy and always encourage that.

Cheers,
Askar

dikey94 · May 23, 2017, 8:36am

Not sure If I did it well but there is a ticket in JIRA.

Thanks!

mppfor_manu · May 23, 2017, 5:31pm

This is a common issue in Camunda. They can’t optimize for every circumstance and the broad support of multiple databases increases the complexity of maintain a common set of schema. We have experienced the same thing that you have and we did some indexing on our own, but were hesitant as that customization would need to be carried forward manually.

History of any sort in Camunda is very, very expensive.

I would be interested in hearing more about your application. You seem to be running extremely large volumes, which is what we’re doing. In fact, you may have far more processes running than we ever do. If you can share your experience, I would appreciate it.

dikey94 · May 24, 2017, 8:34am

@mppfor_manu We are on early stage in our project. Testing solutions, looking for technologies (not only BPMN engines), running performance tests etc.

Generally speaking, this will be a huge application. About 2 million business cases at the start, about 4 million after one year. Each case may have a number o BPMN process (I think about 0-15). There will be a huge number of starting process at the same time (scenario: each morning select n cases and for each case run a process). Telling “process” I mean quite simple Camunda process (a few steps) - mostly user tasks, service tasks and gateways.

I think batch running is not a big deal but generating a list of tasks that meet the given requirements (connected with the case, customer or/and filtered by variables) really is. It’s even more complicated when the list has to be generated frequently (several thousand employees working in different departments, each employee generate the list every 5-180s).

I can’t really share my experience yet because there isn’t any. I can tell you that working with 1 million running process looks nice (we have a simulation with 1 thousand users).

Please, tell me more about your experience.

Cheers.

mppfor_manu · May 24, 2017, 2:11pm

The biggest “weakness”, if I may call it that, and ironically one of the key benefits is its use of a shared database. This database must operate in a READ COMMITTED isolation level. This means you can’t use a cluster for scaling. Essentially, you can only have one database instance supporting a Camunda instance.

This instance may be made up of multiple Camunda servers sharing the database. This is makes scaling the Camunda application itself very easy to do, but you are limited by the back end database. One way to mitigate this is shut off history or turn it down to a level where process variable history is not saved. History is VERY expensive in Camunda.

If you can do without process variable history, then Camunda can handle huge volumes of processes as all the back end database has to do is manage the running process instances and their variables. That said, you will still reach a limit of what a single database can handle. To be sure you can scale up the database hardware or use specialized solutions from various vendors.

Our experience is with a 12 CPU VMware server with 64 GB of RAM running MySQL. This has proven to be completely inadequate. We really need something 10 to 20 times larger. Everyone’s requirements will differ, but it sounds like we both have large volumes, ours are just more transient than yours.

dikey94 · May 25, 2017, 7:48am

Thanks. We are also experimenting with history levels and other configs. And still, we don’t know if the engine will be embedded inside our application or it will be outside. I will report our work here.

Cheers!

mppfor_manu · May 25, 2017, 12:37pm

If you can maintain the level of active processes you are suggesting and have a history level above activity, I would love to know how you’re doing it. As I said before, it is no doubt possible to do, particularly if you use Oracle on one of their dedicated hardware solutions. The cost would be staggering.

Our experience is that with a rate of between 80 and 120 process starts per minute, the database server described above runs at about 80% I/O load and can’t handle much more than that. This VM guest is attached to a SAN.

I am currently coding a solution to completely offload history I/O from the Camunda database server. That’s the only way we’re going to be able to handle our volume. My solution will allow for history records to be written by activity listeners, in a standard “listener” in a task, as well as being called explicitly within a Java or Groovy script.