(reproduced from http://stackoverflow.com/questions/37367749/performance-and-indexes-in-camunda-process-repository)
I’ve been evaluating camunda to use embedded in my Tomcat application for a couple of days but I’m raising some performance worries regarding assignees and variables, to further filter my repository. There are some simple use case scenarios that will be of practical use to me:
-
Filter all active tasks assigned to an specific user.
-
Filter all active process instances associated to a customer.
-
Filter all active tasks from process instances associated to a customer.
So my bigger concerns are:
a - Can I rely on use case 1 having database indexes on assignees that will nicely perform with a highly populated task table in the future?
b - To accomplish use case 2, I’ll add the customer id as a process level variable to my process instances to further filter them. Are variables subject to be indexed as well - supposing assignees are? (say I need to query 20 processes in a 1 million process table which have a given variable set)
c - Finally, from a performance point of view, should I replicate the customer id variable in each task and then filter them without checking the corresponding process instances?
ps1: I’m using postgres database as repository and have no complete understanding of the underlying structure and indexes beneath the BPM engine. If some understanding is necessary to create indexes not available by default, I would appreciate some clues on which tables/columns I should work on.
ps2: I’m not dealing with a critical concurrent software, but potentially will have a big database load in the near future.