Handle Race conditions

Hello Everyone,

We are working on the building Camunda modeler for our Application. We are facing one race condition scenario.

We have a modeler, which consists of 3 sub-processes, out of which 2 sub-process consist of 2 tasks (service task+recieve task), one sub-process consists of message catch event.

There is one delegator that implements service tasks and which internally publishes one Kafka message and then token waits to receive the task just to get acknowledgment. Kafka msg is meant for other system and to do some processing, once the processing is done the same system will publish one ack message and our engine consumes it and calls message event of receive task just to move the token to next block.
Both of the sub-process do the same thing, publish and wait for an acknowledgment.

In between, there is one subprocess that consists of a message catch event, where the token waits for confirmation from another system and then moves the token to the next sub-process.

We are facing the race condition scenario in our 3rd sub-process which is similar to our 1st sub-process.
In-Service task once we publish the message, System A consumes it and does processing, after that it publishes the message and our engine consumes it calls receive task. But the token is not getting to a closure state.
As we can see the token still waiting to receive the task. If we introduce some delay in calling receive task, we could see it is getting processed. We are assuming that we are calling the message event of receive task before the token reaches there. However, this doesn’t happen in 1st sub-process. We are seeing this is happening only after we introduced sub-process which has a message catch event (name of sub-process - ‘Wait for a response from X system’). We are not able to figure out why is this happening.

And if we just swap the places of sub-process we will face this issue in 2nd sub-process, which comes after the message catch event (name of sub-process - ‘Wait for response from X system’). Which justifies the statement that this is happening only after we introduced the name of the sub-process - ‘Wait for response from X system’.

We tried one more option to mitigate this problem but still, it didn’t work out.
In this approach, we removed the service task and only used receive tasks in all sub-process, and the work of service task was done by a listener in receive task at the start, so token will always be there in receive task. At the start, it will send the Kafka message and wait for the message event call in the same block but still, it doesn’t work out.

This is not happening every time, Its intermittently happening.

Please provide your suggestion.

Hi @Tushar_Kapadnis

This is quite a common problem when working with Kafka, the engine is transactional and so does a lot more per request than kafka. Luckily there’s a pretty simple solution that you can implement in two steps

Step One

Make sure you’re model waiting for a response before you send your message to kafka. This can be done with a parallel gateway

Step Two

Make sure the engine commits it’s state before sending a message to Kafka. This is done by selecting the Send Message to Kafkatask and ticking the async before tickbox.

What this will achieve is that Camunda will commit it’s state including waiting for the message before a message is sent to kafka so there’s no longer a fear that the response will be too fast.

3 Likes

Thanks a lot, Niall, It worked. :slight_smile:

1 Like