Handling a receive and send task with retry (Camunda, Rabbitmq)

Celef · December 1, 2020, 9:48am

Hello,

After reading up a lot of documentation and threads on the forum im still stuck on a particular issue and wondered if someone has stumbled on this before and knows a possible solution.

We had an old working Camunda BPMN flow which looked like this:

Why this was made with a parallel send and receive task:
This flow send out a message which was then received later trough RabbitMQ back to camunda.

We first had a problem because of how fast RabbitMQ was compared to Camunda the recieved task would be received before camunda would update to the next step and be ready to receive the message.
To solve this problem we would start the send out task and the receive task at the same time this way the camunda would already be waiting for the message and there were no issues.

When the receive task took longer than 10 minutes the user would be notified that the task would not be received and the process would be cancelled.

New change that was neccesary
Instead of cancelling the process on a 10 minute timeout we wanted to pauze the process evaluate it and then choose for a retry or cancel.

To accommodate for this change we updated the model to different versions shown in the comments below. None of them have worked so far any help would be greatly appreciated!

PS: My question is in 4 different post/comments since I can only use one picture per post.

Celef · December 1, 2020, 9:48am

Split with join and retry:

Problems
The pauzing and sending out the task would go great but during the retry things went wrong.
When retrying the process would jump immediatly to process result instead of waiting to receive the task.

We assumed this was because of the split and join working that the join (after send out task/ receive task) received two signals from send out task and since it had split into two it would complete.

To solve this problem we tried two things both which didn’t work:
See next comment for first thing we tried since I can only post one picture per post.

Celef · December 1, 2020, 9:49am

Split with no join:

Problems
This caused Optimistic locking exceptions we are still unsure what caused this. We couldn’t find any examples where a split part ended without a join (Send Task).
Send Task has no end after starting and doesn’t come back in the flow is this allowed?

For last thing we tried see next comment since I can only post one picture per comment.

Celef · December 1, 2020, 9:49am

Split with event gateway:

Problems
This caused synchronisation issues again. The task would be received (from RabbitMQ) and couldn’t correlate because Camunda wasn’t ready to receive it yet.

Has anyone experience with pauzing camunda processes to retry them later? Are we on the right track or are we trying to do something wierd?

Any ideas, remarks are welcome,

Thanks in advance,

Celef

Niall · December 2, 2020, 9:40am

Thanks for the detailed question - it’s interested to see what you’ve tried so far.
I’m wondering if you could help me understand what you mean by pausing the process instance. What do you expect from a paused instance?

Also it would be helpful to know if you’ve added any additional transaction boundaries using the asyncronouse before tick-box on tasks or events.

Celef · December 3, 2020, 8:18am

Hey Niall,

Thanks for the reply!

Pausing
By pauzing the process instance I mean putting it on hold till an admin checks what went wrong.
So for instance when the Send Task doesn’t get properly handled and we don’t get a reply we do not want to retry immediatly but instead let someone check and evualuate the task. After evaluating the task will be retried or cancelled.

Because the admin will need some time to check out what happened the process needs to wait at that stage and be ‘pauzed’ for a while.

Async
Old (working) flow without retry
Send Task: Async before and exclusive
Receive Task: Async after and exclusive

Split with no join:
We tried most variations with asyncbefore/after/exclusive because we hoped setting the correct one would solve the optimistic locking exceptions. None of them had any succes however.

I have also added the .BPMN files of all the examples. These are part of a bigger flow which I sadly cannot fully share because of disclosure issues.

Split with event gateway.bpmn (14.3 KB) Split with join and retry.bpmn (14.8 KB)

2 more in the next comment!

Thanks in advance,

Celef

Celef · December 3, 2020, 8:19am

Split with no join and retry.bpmn (13.6 KB) Working without retry.bpmn (11.8 KB)

2 more of the flows that we tried. This is a small part of a bigger flow.

Gerben · December 8, 2020, 9:54am

SOLVED:

This was the solution to our problem. By encapsulating the sending and receiving in its own subflow the retry process was now safe and wouldn’t cause problems with hanging proccesses.

This is because the boundary timeout kills all processes when it gets triggered.