Number of invocations per batch causing deadlocks in SQL Server 2016

Hi all,

I’ve been using Camunda 7.14.0 enterprise (SpringBoot) with SQL Server 2016. I was noticing that the batch deletion of process instances was a bit slow (even though it was working), so I decided to increase the property “invocationsPerBatchJob”, in order to allow each execution job of a batch, for instance, to delete multiple process instances.

After changing the property and trying to delete close to 140k instances, I noticed that I started to get constant deadlocks in the database, which basically killed the whole application, as I was unable to access it. To bring it back up, I had to:

  1. Stop the application
  2. Kill all the sessions of the DB user
  3. Delete the batch in the DB (ACT_RU_BATCH)
  4. Delete the execution jobs (ACT_RU_JOB)
  5. Start the application again

The deadlocks seem to be all somewhat associated with the JOB table.

Has anyone ever noticed that? Could it be something potentially associated with SQL Server? (as it is known to have other issues with case-insensitive collations)

On the bottom part, you can find an extract deadlock XML, which I extract from SQL Server using the extended events feature.

Thanks in advance.

<deadlock>
 <victim-list>
  <victimProcess id="process1e294e33088" />
 </victim-list>
 <process-list>
  <process id="process1e294e33088" taskpriority="0" logused="911904" waitresource="KEY: 7:72057594347913216 (310322f197f0)" waittime="32901" ownerId="245387219" transactionname="implicit_transaction" lasttranstarted="2023-02-09T17:47:48.387" XDES="0x1dd2b94a350" lockMode="S" schedulerid="8" kpid="11364" status="suspended" spid="56" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2023-02-09T17:47:48.513" lastbatchcompleted="2023-02-09T17:47:48.493" lastattention="1900-01-01T00:00:00.493" clientapp="Microsoft JDBC Driver for SQL Server" hostname="i-itsm-camap-01" hostpid="0" loginname="dbo_ONEITSMCAM_PHOENIX" isolationlevel="read committed (2)" xactid="245387219" currentdb="7" currentdbname="ONEITSMCAM_PHOENIX" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="41782" stmtend="76150" sqlhandle="0x020000007200020e1ea393a0dbf9616ed313db6393b55f3c0000000000000000000000000000000000000000">
unknown    </frame>
    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 nvarchar(4000),@P3 nvarchar(4000),@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 nvarchar(4000),@P7 nvarchar(4000),@P8 nvarchar(4000),@P9 nvarchar(4000),@P10 nvarchar(4000),@P11 nvarchar(4000),@P12 nvarchar(4000),@P13 nvarchar(4000),@P14 nvarchar(4000),@P15 nvarchar(4000),@P16 nvarchar(4000),@P17 nvarchar(4000),@P18 nvarchar(4000),@P19 nvarchar(4000),@P20 nvarchar(4000),@P21 nvarchar(4000),@P22 nvarchar(4000),@P23 nvarchar(4000),@P24 nvarchar(4000),@P25 nvarchar(4000),@P26 nvarchar(4000),@P27 nvarchar(4000),@P28 nvarchar(4000),@P29 nvarchar(4000),@P30 nvarchar(4000),@P31 nvarchar(4000),@P32 nvarchar(4000),@P33 nvarchar(4000),@P34 nvarchar(4000),@P35 nvarchar(4000),@P36 nvarchar(4000),@P37 nvarchar(4000),@P38 nvarchar(4000),@P39 nvarchar(4000),@P40 nvarchar(4000),@P41 nvarchar(4000),@P42 nvarchar(4000),@P43 nvarchar(4000),@P44 nvarchar(4000),@P45 nvarchar(4000),@P46 nvarchar(4000),@P47 nvarchar(4000),@P48 nvarchar(4000),@P49 nvarchar(4000),@P50 nvarchar(4000),@P51 nvarcha   </inputbuf>
  </process>
  <process id="process1dcb8d43848" taskpriority="0" logused="941372" waitresource="KEY: 7:72057594347913216 (310322f197f0)" waittime="32901" ownerId="245387639" transactionname="implicit_transaction" lasttranstarted="2023-02-09T17:47:49.047" XDES="0x1e1378895c0" lockMode="S" schedulerid="1" kpid="3848" status="suspended" spid="68" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2023-02-09T17:47:49.193" lastbatchcompleted="2023-02-09T17:47:49.190" lastattention="1900-01-01T00:00:00.190" clientapp="Microsoft JDBC Driver for SQL Server" hostname="i-itsm-camap-01" hostpid="0" loginname="dbo_ONEITSMCAM_PHOENIX" isolationlevel="read committed (2)" xactid="245387639" currentdb="7" currentdbname="ONEITSMCAM_PHOENIX" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="41782" stmtend="76150" sqlhandle="0x020000007200020e1ea393a0dbf9616ed313db6393b55f3c0000000000000000000000000000000000000000">
unknown    </frame>
    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 nvarchar(4000),@P3 nvarchar(4000),@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 nvarchar(4000),@P7 nvarchar(4000),@P8 nvarchar(4000),@P9 nvarchar(4000),@P10 nvarchar(4000),@P11 nvarchar(4000),@P12 nvarchar(4000),@P13 nvarchar(4000),@P14 nvarchar(4000),@P15 nvarchar(4000),@P16 nvarchar(4000),@P17 nvarchar(4000),@P18 nvarchar(4000),@P19 nvarchar(4000),@P20 nvarchar(4000),@P21 nvarchar(4000),@P22 nvarchar(4000),@P23 nvarchar(4000),@P24 nvarchar(4000),@P25 nvarchar(4000),@P26 nvarchar(4000),@P27 nvarchar(4000),@P28 nvarchar(4000),@P29 nvarchar(4000),@P30 nvarchar(4000),@P31 nvarchar(4000),@P32 nvarchar(4000),@P33 nvarchar(4000),@P34 nvarchar(4000),@P35 nvarchar(4000),@P36 nvarchar(4000),@P37 nvarchar(4000),@P38 nvarchar(4000),@P39 nvarchar(4000),@P40 nvarchar(4000),@P41 nvarchar(4000),@P42 nvarchar(4000),@P43 nvarchar(4000),@P44 nvarchar(4000),@P45 nvarchar(4000),@P46 nvarchar(4000),@P47 nvarchar(4000),@P48 nvarchar(4000),@P49 nvarchar(4000),@P50 nvarchar(4000),@P51 nvarcha   </inputbuf>
  </process>
  <process id="process1dd08329468" taskpriority="0" logused="41067944" waitresource="KEY: 7:72057594347913216 (d51ba752bb29)" waittime="416" ownerId="245310120" transactionname="implicit_transaction" lasttranstarted="2023-02-09T17:46:54.377" XDES="0x1e2b9f19290" lockMode="S" schedulerid="6" kpid="13588" status="suspended" spid="94" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2023-02-09T17:48:06.023" lastbatchcompleted="2023-02-09T17:48:06.020" lastattention="1900-01-01T00:00:00.020" clientapp="Microsoft JDBC Driver for SQL Server" hostname="i-itsm-camap-02" hostpid="0" loginname="dbo_ONEITSMCAM_PHOENIX" isolationlevel="read committed (2)" xactid="245310120" currentdb="7" currentdbname="ONEITSMCAM_PHOENIX" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="41782" stmtend="76122" sqlhandle="0x02000000dec6621c6612fb5dedc8b0d2ec46227789590d880000000000000000000000000000000000000000">
unknown    </frame>
    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 nvarchar(4000),@P3 nvarchar(4000),@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 nvarchar(4000),@P7 nvarchar(4000),@P8 nvarchar(4000),@P9 nvarchar(4000),@P10 nvarchar(4000),@P11 nvarchar(4000),@P12 nvarchar(4000),@P13 nvarchar(4000),@P14 nvarchar(4000),@P15 nvarchar(4000),@P16 nvarchar(4000),@P17 nvarchar(4000),@P18 nvarchar(4000),@P19 nvarchar(4000),@P20 nvarchar(4000),@P21 nvarchar(4000),@P22 nvarchar(4000),@P23 nvarchar(4000),@P24 nvarchar(4000),@P25 nvarchar(4000),@P26 nvarchar(4000),@P27 nvarchar(4000),@P28 nvarchar(4000),@P29 nvarchar(4000),@P30 nvarchar(4000),@P31 nvarchar(4000),@P32 nvarchar(4000),@P33 nvarchar(4000),@P34 nvarchar(4000),@P35 nvarchar(4000),@P36 nvarchar(4000),@P37 nvarchar(4000),@P38 nvarchar(4000),@P39 nvarchar(4000),@P40 nvarchar(4000),@P41 nvarchar(4000),@P42 nvarchar(4000),@P43 nvarchar(4000),@P44 nvarchar(4000),@P45 nvarchar(4000),@P46 nvarchar(4000),@P47 nvarchar(4000),@P48 nvarchar(4000),@P49 nvarchar(4000),@P50 nvarchar(4000),@P51 nvarcha   </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <keylock hobtid="72057594347913216" dbid="7" objectname="ONEITSMCAM_PHOENIX.dbo.ACT_RU_JOB" indexname="ACT_IDX_JOB_EXCEPTION_STACK" id="lock1e1706c4480" mode="X" associatedObjectId="72057594347913216">
   <owner-list>
    <owner id="process1dcb8d43848" mode="S" requestType="wait" />
   </owner-list>
   <waiter-list>
    <waiter id="process1e294e33088" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
  <keylock hobtid="72057594347913216" dbid="7" objectname="ONEITSMCAM_PHOENIX.dbo.ACT_RU_JOB" indexname="ACT_IDX_JOB_EXCEPTION_STACK" id="lock1e1706c4480" mode="X" associatedObjectId="72057594347913216">
   <owner-list>
    <owner id="process1dd08329468" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process1dcb8d43848" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
  <keylock hobtid="72057594347913216" dbid="7" objectname="ONEITSMCAM_PHOENIX.dbo.ACT_RU_JOB" indexname="ACT_IDX_JOB_EXCEPTION_STACK" id="lock1dd63b3f000" mode="X" associatedObjectId="72057594347913216">
   <owner-list>
    <owner id="process1e294e33088" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process1dd08329468" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
 </resource-list>
</deadlock>

Hi @lucascalsilva,

nice to meet you again!

As an enterprise customer, you can also open a HELP-Request, and we can spend an hour of your remote consulting budget to discuss the issue online.

(This is just advertising for the support that every enterprise customer gets from Camunda).

What do you think?

P.S: I vaguely remember that you can increase the parallel runs of batches. But in a quick search in the docs, I found that it is exclusive for history cleanup: historyCleanupDegreeOfParallelism.

And from the database log, I assume that two stacktraces from different jobs deadlock the transaction. Perhaps you can reset the invocationsPerBatchJob to 1 and increase the batchJobsPerSeed to 200?

See you, Ingo