We experienced a case where two external service tasks started for the same activity in the same process instance immediately after a preceding user task completed.
The two external tasks started to the nearest millsecond (according to the logs) each in a distinct external task process (we use node js external tasks)
There isn’t any evidence of a related issue in the camunda server logs and the system had been relatively idle in the preceding moments.
We are running a dockerised version of Camunda 7.10 in AWS ECS fargate task. There are two camunda tasks and 2 workflow tasks. We are using Postgres 10.22 as a database. There is no evidence of previous attempts to execute the same activity instance. There are two records associated with the activity instance in act_hi_actinst.
What looks like is happening (based on the precise timing that both instances of the external task starting) is that a race condition somewhere within the camunda process server resulted in the same activity instance being dispatched to two waiting external task pollers but there doesn’t seem to be any good explanation for why this may have occurred.
In case this is relevant the external tasks each fetch 3 tasks from camunda each time they poll. There
Here is an extract from the act_hi_taskinst table showing the two activity instances that started at
Each executed in a separate external task process. The end result was two messages got sent to a CRM system when we only expected to send one.
select start_time_, proc_def_key_, act_name_, act_type_ from act_hi_actinst order by start_time_ desc limit 20; 2023-03-10 08:00:34.681 | XXXX | | noneEndEvent 2023-03-10 08:00:34.485 | YYYY | xxxxxxxx xxxxxxxxx xxxx | serviceTask 2023-03-10 08:00:34.327 | YYYY | xxxxxxxx xxxxxxxxx xxxxx | serviceTask 2023-03-10 08:00:34.321 | XXXX | | noneEndEvent 2023-03-10 08:00:34.307 | YYYY | | exclusiveGateway 2023-03-10 08:00:33.884 | YYYY | xxxxxxxx xxxxxxxxx xxxxx | serviceTask 2023-03-10 08:00:33.21 | ZZZZ | | noneEndEvent 2023-03-10 08:00:33.21 | ZZZZ | | startEvent 2023-03-10 08:00:33.21 | YYYY | xxxxxxxx xxxxxxxxx xxxx | serviceTask 2023-03-10 08:00:32.709 | YYYY | | startEvent 2023-03-10 08:00:32.709 | YYYY | Bxxxxxxxx xxxxxxxxx xxxx | callActivity 2023-03-10 08:00:32.293 | XXXX | Emit Site Onboarded Event | serviceTask 2023-03-10 08:00:32.293 | XXXX | Emit Site Onboarded Event | serviceTask 2023-03-10 07:56:59.439 | XXXX | Requires onboarding? | exclusiveGateway 2023-03-10 07:56:59.439 | XXXX | Onboard Customer Site | userTask 2023-03-10 07:56:58.755 | XXXX | | exclusiveGateway 2023-03-10 07:56:58.755 | XXXX | xxxxxxxx xxxxxxxxx xxxx | startEvent 2023-03-10 07:56:58.755 | XXXX | xxxxxxxx xxxxxxxxx xxxx | serviceTask 2023-03-10 07:54:52.327 | XXXX | | noneEndEvent 2023-03-10 07:54:52.327 | XXXX | xxxxxxxx xxxxxxxxx xxxx | exclusiveGateway
Has anyone ever seen behaviour like this before? What was its cause?