Throws/Catch signal events out of sync on parallel branches

MedicalFlows · December 2, 2022, 10:14am

I cannot figure out un-caught signals on a parallel branch on my own. I have created a minimum diagram to test. I hope somebody more experienced will immediately see where the problem is (I failed to solve it after 3 days of trying).

The diagram has two parallel branches A (fails) and B (succeeds):

B is using a delay timer before the “B ended” signal is thrown.

I would expect that tokens would be at tasks A3&A4 and B3&B4 (second column of tasks).

However the token is stuck at task A2. The B tokens are as expected at B3&B4.

Why is the “A ended” signal not caught by the A2 task?
The same “B ended” signal is caught successfully by the B2 task because it is thrown after a 5 seconds delay.

It looks like task A2 is not “instantiated and receiving events” when the “A ended” signal is thrown.

Is this a case of a sync issues between two parallel branches?

And if this wouldn’t be enough, the same process instance moves the token to A4 if I redeploy an unmodified (or modified) diagram from Camunda Modeler and refresh it in the Cockpit.

The running process instance is not migrated to a newer version (it keeps its original “Definition Version”).

Is the “A ended” signal re-thrown on the running instance even if there is no migration?

Configuration:

Camunda Modeler desktop Version 5.5.1
Camunda Platform 7.18 local self-hosted

Diagram:
acknowledge-signal-test.bpmn (17.7 KB)

GotnOGuts · December 2, 2022, 4:51pm

You have what is known as a race condition.

This is frequently seen in Throw/Catch messages on the forum, but my guess would be that the same applies with Signals… If the process is not yet waiting for the signal, it won’t catch it.

The delay timer in B path seems to support this hypothesis.

Try putting a set of “Set Variables” at the beginning of the process (AEnded=False, BEnded=False), and replace the AEnded Signal catch with a “Conditional” interrupting (Condition: AEnded=True)

Add an event-based subprocess that starts at the same time as the process that catches the “A Ended” signal and sets AEnded=True. This should mean that as soon as A2 starts, it should see the AEnded=True, and move to A4.

Let us know if that works

MedicalFlows · December 2, 2022, 5:15pm

Yes, race condition is a great description of what is happening.

Thank you for the suggested solution.

However the timer or conditional events are both technical workarounds. Additional elements are needed that are not part of the process being modeled.

In my case the modelers are expected to be “clinical modelers” (medical doctors with some IT training). It would be nice if they weren’t required to know about these workarounds. I do expect that a knowledgeable BPMN expert with programming experience would have to assist them to make their diagrams executable so the final version of the diagrams could use the workarounds if necessary.

It would be great if the engine would minimize the number of workarounds required.

In this simple diagram it could make sure that after every parallel gateway, the first elements on every branch are fully operational before continuing the process.

In the attached diagram this would mean that “Activity A2” is able to catch signals before the “A started” signal is thrown on the parallel branch. This is probably just one of the rules required.

I’m expecting that there were previous attempts to avoid presented scenarios in the engine. I would be interested in learning more about them.

GotnOGuts · December 2, 2022, 5:19pm

I believe the “message thrown before process waiting to accept” issue is solved in Camunda 8.
I’m not sure about the Signal issue.

You can explain it to your subject modellers by explaining that they need to model similar to a PA system.
If Dr. Brown is not in the building, and is paged to room 300, that page will never be answered.
When Dr Brown arrives, they don’t check to see if they have any missed pages (since obviously someone else would have handled the page that was missed).
If it’s critical for Dr Brown to check for any pages that came in while they were out, then that needs to be modelled.

MedicalFlows · December 2, 2022, 5:43pm

As programmers we would blame ourselves if we hand-coded this process and we didn’t handle the synchronization issue of the parallel branches.

I understand how we should take care of every edge case on a technical level. However BPMN diagrams are a higher abstraction and it would be great if the engine would be able to handle as many common cases.

There is a parallel gateway element and the signal is thrown as a 3rd element on the top branch. From the BPMN semantics point of view, it should be catched by the 1st element on the bottom branch.

Since both branches should run in “parallel”, modelers expect that by the time the 3rd element of the top branch is reached, the 1st element of the bottom branch will already be fully operational. This is what the “picture” says.

I’m not criticizing how the engine is handling this scenario. I’m just trying to learn if there is room for improvement.

p.s.
I’m visiting BPMN coverage | Camunda Platform 8 Docs on a weekly basis. The plan is to move to Camunda 8 as soon as signal/conditional/escalation events are supported.

MedicalFlows · December 2, 2022, 6:09pm

This discussion got me thinking on how to solve this issue.

I’m already using the send/acknowledge pattern for some signals in my diagrams (side-by-side cut-out of different parts of a large diagram):

The idea: a throw signal element could be extended (via properties) to require it to be caught by at least one (or minimum of N) events. If nobody caught the thrown signal it would repeat for a specified cycle (e.g. R5/PT5S).

This wouldn’t cover all use cases but it would simplify modeling it with a send/acknowledge pattern.

GotnOGuts · December 2, 2022, 7:40pm

But also remember that the concept of a signal is “Broadcast” with 0-or-more receivers

The PA system analogy is really appropriate here… You can play a message in an empty room. No receivers. The task as documented applied.

It you really need “Condition Occurred” logic that doesn’t cross process lines, then “Set Variable” and “Condition Catch” are the pieces that fit the goal the best.

MedicalFlows · December 2, 2022, 8:06pm

I agree with you that the conditional boundary event works. Here is an updated version of the diagram based on your suggestion:

My main concern is that it is technically correct but not as semantically clear as the one with signals. It also “hides” the logic into properties of the script task and boundary event.

I’m hoping that Zeebe signals will be able to process my use case faster so that the “Activity A2” will be ready sooner.

p.s.
In practice the tasks are assigned to doctors/nurses that take much longer than 5 seconds to complete them
They are a built-in human delay timer

GotnOGuts · December 2, 2022, 11:29pm

Maybe I’m missing something…

Why signals?
I just realized that Signals are not intended to be scoped the same way that messages and conditions are – they notify ALL listening instances, not a specified one.
So if you had two instances of your process running, when the second one hits “A Ended” the first one should receive the signal and continue on to A4.

This can be likened to multiple buildings at a hospital all being connected to the same PA system. A code blue in Building B should not get people in Building A responding…

MedicalFlows · December 3, 2022, 4:24pm

Thank you @GotnOGuts for being persistent about using conditional events instead of signals
I replaced the signals with variables/conditions for my diagram and they work as expected.

I never felt comfortable about signals being broadcast to the entire engine. It would be great if they could be limited to the current process instance.
Partially that can be accomplished by passing variables with a signal. The catch event could then decide whether the signal is meant for him.

I believe signals are easier to understand for non-tech BPMN users compared to using script tasks, variables and conditions.

Maybe a new “intermediate throw conditional event” element could bridge both worlds. It would be a standard “script task” for setting variables but with a throw version of the standard “Intermediate Conditional Catch Event” icon (black with white lines).
It would indicate that a conditional event is triggered and the users would look for the matching catch conditional icons with the same label.