Performance and indexes in Camunda process repository

(reproduced from http://stackoverflow.com/questions/37367749/performance-and-indexes-in-camunda-process-repository)

I’ve been evaluating camunda to use embedded in my Tomcat application for a couple of days but I’m raising some performance worries regarding assignees and variables, to further filter my repository. There are some simple use case scenarios that will be of practical use to me:

  1. Filter all active tasks assigned to an specific user.

  2. Filter all active process instances associated to a customer.

  3. Filter all active tasks from process instances associated to a customer.

So my bigger concerns are:

a - Can I rely on use case 1 having database indexes on assignees that will nicely perform with a highly populated task table in the future?

b - To accomplish use case 2, I’ll add the customer id as a process level variable to my process instances to further filter them. Are variables subject to be indexed as well - supposing assignees are? (say I need to query 20 processes in a 1 million process table which have a given variable set)

c - Finally, from a performance point of view, should I replicate the customer id variable in each task and then filter them without checking the corresponding process instances?

ps1: I’m using postgres database as repository and have no complete understanding of the underlying structure and indexes beneath the BPM engine. If some understanding is necessary to create indexes not available by default, I would appreciate some clues on which tables/columns I should work on.

ps2: I’m not dealing with a critical concurrent software, but potentially will have a big database load in the near future.

Hi,

If your’e interested in database structure etc, theres a few things you can look at. The database structure is described in [1]

You can also see the database creation scripts by looking in the pre-packaged distributions. For Tomcat, look at a folder sql/create.

To answer your specific questions;
There is a table called ACT_RU_TASK. This has the runtime tasks. Looking at the SQL script for Postgresql, there is an index ACT_IDX_TASK_ASSIGNEE on ACT_RU_TASK(ASSIGNEE_). Hence it looks like there is an index on task assignee.

Variables are a little more complex as the variable type may determine where it is actually stored.

There is an index on businessKey, hence if you use the businessKey to store the customer ID, you may get some traction from businessKey.

Apart from adding indexes in appropriate places if necesary, another technique to consider is add your own table which is an index into the engine. For example, using task or process lifecycle event listeners etc, you could keep a table up to date with lookup values. Hence you could keep a table of customer id and process or task ids, or a certain variable and task id etc. Once the tables are in place you could add customer sql or plugins to use the capability.

regards

Rob

[1] https://docs.camunda.org/manual/7.4/user-guide/process-engine/database/

1 Like