Deadlocked Processes, maybe

ppiela · May 9, 2018, 4:20pm

Hi All. I have a newbie/basic question, but my research has not turned up an obvious answer. I have a Camunda instance running on Ubuntu inside Tomcat with a MariaDB database. I submitted approximately 150 instances of a particular workflow. After some time execution appears complete, but 11 jobs are still shown as running in the cockpit. It is not clear to me in what state these process instances are, and I am looking for tips on how to debug further. I scanned the Tomcat and applications logs, and did not find any obvious failures.

The workflow contains a multi-instance subprocess that runs child flows in parallel. All child flows must complete for the subprocess to be complete. All tasks are running asyncAfter = true and exclusive = false. Most of the tasks are service tasks that use the HTTP connector to call external microservices.

I can provide more detail, but wanted to get some input on the debugging process so that I can provide the right data.

I see 11 jobs in the ACT_RU_JOB table, all of which have 3 retries. In all cases handler-type is async-continuation, and the handler-cfg is either activity-end or transition-notify-listener$SequenceFlow_…

I am also seeing some optimistic locking exceptions.

I am also seeing occurrences of the following exception. The exception references the POST used to start the process, but I am pretty sure that all processes are launched. I can see possible timeout issues between the task and the external microservice being called.

I simplified the execution by queuing one process instance at a time, but still see the problem (processes stalling). All stalled jobs have HANDLER_CFG_ = activity-end or transition-notify-listener-take$SequenceFlow_…

Any suggestions are much appreciated.

Thanks. Peter

Webcyberrob · May 10, 2018, 3:08am

Hi,

My advice to help you diagnose;
Turn exclusive flags back on for your parallel asynchronous executions as this will remove a potential source of optimistic locking exceptions.
Consider your remote service calls, are you using the connector library? Sockets based on the connector library may be held in a blocked state for a long time. This can cause undersirable side effects in the job executors…

Consider this thread (and others) to more agressively timeout. This thread regarding JSoup may also be of interest…

regards

Rob

StephenOTT · May 10, 2018, 4:09am

Ya this does seem like the similar issues to the http-connector timeout issue

What happens if you restart the server? Does the jobs complete?

ppiela · May 10, 2018, 10:46pm

Typically the jobs will complete, but not always.

ppiela · May 10, 2018, 10:51pm

Thanks Rob. What is throwing me is that the jobs that appear stalled are activity-end and transition-notify, as opposed to tasks that call out to my microservices.

ppiela · May 10, 2018, 10:52pm

I am using the HTTP connector to make REST calls to the microservices.

StephenOTT · May 11, 2018, 12:00am

Can you try using jsoup as in the links above and see if it still occurs.

ppiela · May 11, 2018, 12:41am

Thanks Stephen, I will try that. Another confusing thing is that cockpit indicates that the process is stuck completing a multi-instance task; however, the logs indicate that a downstream task (from the multi-instance) has been executed and completed. I feel like I am missing something obvious.

system · January 30, 2024, 11:13am