Hello Everyone,
We are working on the building Camunda modeler for our Application. We are facing one race condition scenario.
We have a modeler, which consists of 3 sub-processes, out of which 2 sub-process consist of 2 tasks (service task+recieve task), one sub-process consists of message catch event.
There is one delegator that implements service tasks and which internally publishes one Kafka message and then token waits to receive the task just to get acknowledgment. Kafka msg is meant for other system and to do some processing, once the processing is done the same system will publish one ack message and our engine consumes it and calls message event of receive task just to move the token to next block.
Both of the sub-process do the same thing, publish and wait for an acknowledgment.
In between, there is one subprocess that consists of a message catch event, where the token waits for confirmation from another system and then moves the token to the next sub-process.
We are facing the race condition scenario in our 3rd sub-process which is similar to our 1st sub-process.
In-Service task once we publish the message, System A consumes it and does processing, after that it publishes the message and our engine consumes it calls receive task. But the token is not getting to a closure state.
As we can see the token still waiting to receive the task. If we introduce some delay in calling receive task, we could see it is getting processed. We are assuming that we are calling the message event of receive task before the token reaches there. However, this doesn’t happen in 1st sub-process. We are seeing this is happening only after we introduced sub-process which has a message catch event (name of sub-process - ‘Wait for a response from X system’). We are not able to figure out why is this happening.
And if we just swap the places of sub-process we will face this issue in 2nd sub-process, which comes after the message catch event (name of sub-process - ‘Wait for response from X system’). Which justifies the statement that this is happening only after we introduced the name of the sub-process - ‘Wait for response from X system’.
We tried one more option to mitigate this problem but still, it didn’t work out.
In this approach, we removed the service task and only used receive tasks in all sub-process, and the work of service task was done by a listener in receive task at the start, so token will always be there in receive task. At the start, it will send the Kafka message and wait for the message event call in the same block but still, it doesn’t work out.
This is not happening every time, Its intermittently happening.
Please provide your suggestion.