Perf. Issue with identity link db inserts and deletes

tomorrow · December 13, 2016, 8:28pm

Hello, I’m evaluating camunda as new process engine at our company. We are running performance test. We have history with approximately 180k finished processes and about 30k runing processes. As we will have user task with candidate users up to hundred in count, we’re testing this scenario too.
Unfortunately, we found weak spot here, as camunda creates insert/delete SQL command to act_ru_identity link table for each candidate user in the created/deleted user task.
Is there any way to make the inserts custom, so the inserts/delete were executed in bulk? Or is there some other way to avoid our problem? We cannot use candidate groups because of our user model and our needs on assignment policy.

aakhmerov · December 15, 2016, 8:15am

Hi @tomorrow,

in general you can override any method implementation provided by camunda. You can do this either by forking codebase and replacing a class or by providing extension. I did not get exactly which method is causing trouble for you exactly. Could you tell me method signature?

Cheers,
Askar

tomorrow · December 16, 2016, 12:10pm

hello, here is the method regarding delete task, there is following method in IdentityLinkProvider.

public void deleteIdentityLinksByTaskId(String taskId) {
  List<IdentityLinkEntity> identityLinks = findIdentityLinksByTaskId(taskId);
  for (IdentityLinkEntity identityLink: identityLinks) {
    deleteIdentityLink(identityLink);
}

If i am right, following delete sql command is used multiple times:

  <delete id="deleteIdentityLink" parameterType="string">
    delete from ${prefix}ACT_RU_IDENTITYLINK where ID_ = #{id}
  </delete>

However, if there is not some reason I am not aware of as I don’t know camunda into the deep yet, something like this could be used, without needing the for cycle

  <delete id="deleteIdentityLink" parameterType="string">
    delete from ${prefix}ACT_RU_IDENTITYLINK where TASK_ID_ = #{taskId}
  </delete>

There is even index on that column.

Or this:

  <delete id="bulkDeleteIdentityLink" parameterType="java.util.Collection">
    delete from ${prefix}ACT_RU_IDENTITYLINK where 
    <foreach item="identityLink" collection="list" index="index" separator=" or ">
        ID_ = #{identityLink.id, jdbcType=VARCHAR}
    </foreach>
  </delete>

For create task, more precisely create indentity links when creating task there is following method, there is again method with for cycle:

  @Override
  public void addCandidateUsers(Collection<String> candidateUsers) {
    for (String candidateUser : candidateUsers) {
      addCandidateUser(candidateUser);
    }
  }

which results in following sql command.

  <insert id="insertIdentityLink" parameterType="org.camunda.bpm.engine.impl.persistence.entity.IdentityLinkEntity">
    insert into ${prefix}ACT_RU_IDENTITYLINK (ID_, TYPE_, USER_ID_, GROUP_ID_, TASK_ID_, PROC_DEF_ID_, TENANT_ID_, REV_ )
    values (#{id, jdbcType=VARCHAR},
            #{type, jdbcType=VARCHAR},
            #{userId, jdbcType=VARCHAR},
            #{groupId, jdbcType=VARCHAR},
            #{taskId, jdbcType=VARCHAR},
            #{processDefId, jdbcType=VARCHAR},
            #{tenantId, jdbcType=VARCHAR},
            1
           )
            
  </insert>

which could be replaced with something like this,

  <insert id="bulkInsertIdentityLink" parameterType="org.activiti.engine.impl.persistence.entity.IdentityLinkEntity">
    insert into ${prefix}ACT_RU_IDENTITYLINK (ID_, REV_, TYPE_, USER_ID_, GROUP_ID_, TASK_ID_, PROC_INST_ID_, PROC_DEF_ID_)
    values 
    <foreach collection="list" item="identityLink" index="index" separator=","> 
      (#{identityLink.id, jdbcType=VARCHAR},
        1,
        #{identityLink.type, jdbcType=VARCHAR},
        #{identityLink.userId, jdbcType=VARCHAR},
        #{identityLink.groupId, jdbcType=VARCHAR},
        #{identityLink.taskId, jdbcType=VARCHAR},
        #{identityLink.processInstanceId, jdbcType=VARCHAR},
        #{identityLink.processDefId, jdbcType=VARCHAR})
    </foreach>
  </insert>
  
  <insert id="bulkInsertIdentityLink_oracle" parameterType="org.activiti.engine.impl.persistence.entity.IdentityLinkEntity">
    INSERT ALL
      <foreach collection="list" item="identityLink" index="index">
        into ${prefix}ACT_RU_IDENTITYLINK (ID_, REV_, TYPE_, USER_ID_, GROUP_ID_, TASK_ID_, PROC_INST_ID_, PROC_DEF_ID_) VALUES
           (#{identityLink.id, jdbcType=VARCHAR},
            1,
            #{identityLink.type, jdbcType=VARCHAR},
            #{identityLink.userId, jdbcType=VARCHAR},
            #{identityLink.groupId, jdbcType=VARCHAR},
            #{identityLink.taskId, jdbcType=VARCHAR},
            #{identityLink.processInstanceId, jdbcType=VARCHAR},
            #{identityLink.processDefId, jdbcType=VARCHAR})
      </foreach>
    SELECT * FROM dual
  </insert>

This could be replace with one of these, depending on the DB provider.

  <insert id="bulkInsertIdentityLink" parameterType="org.activiti.engine.impl.persistence.entity.IdentityLinkEntity">
    insert into ${prefix}ACT_RU_IDENTITYLINK (ID_, REV_, TYPE_, USER_ID_, GROUP_ID_, TASK_ID_, PROC_INST_ID_, PROC_DEF_ID_)
    values 
    <foreach collection="list" item="identityLink" index="index" separator=","> 
      (#{identityLink.id, jdbcType=VARCHAR},
        1,
        #{identityLink.type, jdbcType=VARCHAR},
        #{identityLink.userId, jdbcType=VARCHAR},
        #{identityLink.groupId, jdbcType=VARCHAR},
        #{identityLink.taskId, jdbcType=VARCHAR},
        #{identityLink.processInstanceId, jdbcType=VARCHAR},
        #{identityLink.processDefId, jdbcType=VARCHAR})
    </foreach>
  </insert>
  
  <insert id="bulkInsertIdentityLink_oracle" parameterType="org.activiti.engine.impl.persistence.entity.IdentityLinkEntity">
    INSERT ALL
      <foreach collection="list" item="identityLink" index="index">
        into ${prefix}ACT_RU_IDENTITYLINK (ID_, REV_, TYPE_, USER_ID_, GROUP_ID_, TASK_ID_, PROC_INST_ID_, PROC_DEF_ID_) VALUES
           (#{identityLink.id, jdbcType=VARCHAR},
            1,
            #{identityLink.type, jdbcType=VARCHAR},
            #{identityLink.userId, jdbcType=VARCHAR},
            #{identityLink.groupId, jdbcType=VARCHAR},
            #{identityLink.taskId, jdbcType=VARCHAR},
            #{identityLink.processInstanceId, jdbcType=VARCHAR},
            #{identityLink.processDefId, jdbcType=VARCHAR})
      </foreach>
    SELECT * FROM dual
  </insert>

Just not to take credit for the sql command, I c&p it from activiti source codes. As we have not tested Activiti yet, and maybe we won’t, we can’t the the exact impact. We’d also like to know, if something similar could be done in camunda, e.g. under commercial support, or if it is in the roadmap of camunda to make some optimizations like these.

If we would have to make the changes ourselves, that also wouldn’t be a problem. However, as I understand backward compatibility, couldn’t this make some problems when upgrading to newer versions of camunda, as those method are inside “impl” packages?

aakhmerov · December 19, 2016, 8:59am

Hi @tomorrow,

in general, all implementations under impl packages can changed without notice, only public APIs are guaranteed to remain compatible.
About commercial support, you have to consult someone from sales team, but in general there is a possibility to create a feature request based on needs of enterprise users. You can even add feature request to our JIRA as a community user, although I cannot promise anything about prioritization of the backlog. And you can always prepare a pull request for existing ticket, which will be appreciated a lot by camunda and community, you can read more about contributing here https://github.com/camunda/camunda-bpm-platform/blob/master/CONTRIBUTING.md.

About your code, I think that code there can be definitely optimized and you are right, method deleteIdentityLinksByTaskId should be doing what it implies, i.e. delete by task id and not just by ID. It would be great if you could prepare a pull request with that improvement.

Cheers,
Askar

system · January 30, 2024, 10:44am