I have seen that most of Camunda database queries uses the sentence “select distinct…” to perform searches. For example this one, when I search historical process instances by tenantId (I’m using PostgreSQL 11.12):
select distinct RES.*
from (
SELECT SELF.*, DEF.NAME_, DEF.VERSION_, DEF.DEPLOYMENT_ID_
FROM ACT_HI_PROCINST SELF LEFT JOIN ACT_RE_PROCDEF DEF ON SELF.PROC_DEF_ID_ = DEF.ID_
WHERE ( SELF.TENANT_ID_ in ( ‘aTenantId’ ) )
) RES
order by RES.ID_ asc
LIMIT 20 OFFSET 0
My question is, why Camunda always uses “distinct” for almost all its db queries ?. In the previous example, it makes the query incredibly slower (200ms without distinct vs around 8 seconds with distinct). I have been testing and so far I always got the same results, same quantity and same order with and without “distinct”.
In which scenarios Camunda db queries can return duplicates and make it necessary to use the distinct statement ?
What would happen if we just remove the “distinct” from all queries ?
I tested it, and the performance is much better without “distinct”. So far we haven’t find issues, but still I would like to know in which scenarios Camunda db queries can return duplicates and make it necessary to use the distinct statement ?