Camunda Event Gateway getting stuck in few cases

generatedForms.bpmn (6.8 KB)

Hi Appreciate any help or feedback on this,

As per attached BPMN use case where I am using an event gateway to hold the business process while executing an external Service Task

In very few process instances that we triggered - We noticed that few tickets are getting stuck at the event gateway.
Here is the sequence of events

External Job got triggered
Poll the Job via a Spring boot app with the embedded external service task jar
While the service responds → Update the variable to success/failure (the same variable is used in the conditional event attached to the Event Gateway)
Complete the External Service Task (imagine this and the above event are happening at the same time stamp with difference in Milli seconds)
Event Gateway gets created
Event Gateway is not moving ahead.

While this is the behavior in 90% of use case the same BPMN works as expected while it proceeds to the later task in the workflow.

Is there anything I can do differently to avoid my process instance id getting stuck at the event gateway.

The process engine won’t begin evaluating the events until the service task completes. Then it will evaluate based on the conditions changing (so it they don’t change, your process will sit waiting until they do).

You are also setting yourself up for potential optimistic lock exceptions, if the service updates the variables while it’s also potentially sending back a complete message in a different path.

You could try swapping the Event Based Gateway for a simple data evaluation gateway, which won’t be evaluated until the service task is completed.

2 Likes

Thanks for your reply. Appreciate it.
Yes, I agree - Switching to an exclusive gateway would provide an immediate fix in this case. And you are also true about the Optimistic lock exception - I faced it and upon using the Async Before on the Event Gateway I was able to get that fixed. (What are your thoughts on this?)

But my other follow up question would be - Switching to an exclusive gateway might work in this case because the service Task is based on external task implementation, as its a wait task this approach to switch to exclusive gateway would work.

If this same service task is a connector based implementation and I want the process to hold until its response is received do you think Event Gateway would still be required in such cases. (Don’t want to another User Task to hold the flow)

Lastly, while I tested this BPMN by setting the variable for the expression to become true or the conditional event to fire at the creation of process instance ID - The BPMN has always progressed to the user task or the next task. So I am confused on the part where you mentioned - (so it they don’t change, your process will sit waiting until they do). In this case the change happened before the service task was even instantiated or the Event Gateway was triggered. Not sure on the sequence of actions by the process engine.

Hello my dear! How are you? :smiley:

“OptimisticLocking” occurs whenever Camunda tries to update the same variable 2 or more times “concurrently” (at the same time), then this exception is thrown.

In this case that you gave the example, the process will be stopped until its external task is complete, because the camunda does the “fetch and lock” but when it arrives at the gateway, it will already carry out the validation at the same time that the instance arrives there and go with the flow.

But let’s say that the variable you need to evaluate is coming from some other subprocess other than this worker connected to your event-based gateway, then it’s worth using.

Because in this example I mentioned, your worker will finish the task, and will wait until some subprocess updates the process variable which your conditional event is evaluating, and then it will proceed to the flow that was validated as “true”.

But as GotOnGuts said, if this variable is updated by your external task, or even if the variable is updated by an http-connector, this update would originate at the end of the service task, and then it would make more sense to use an exclusive gateway doing the validation straight in the sequence flow.

I hope I have contributed something.

Regards.
William Robert Alves

1 Like

Regarding the solution you found for Optimistic Locking:

  • As mentioned above, this occurs when Camunda tries to concurrently update the same variable.

Maybe you managed to solve it with async before or after because of the “save points” hehehehe.
Camunda always saves its current state in the database when it reaches a “breakpoint”, that is, whenever it needs to wait for some action to happen… so at this moment it persists the data in the database.

Examples of “breakpoints”: User tasks, Receive tasks, Event based gateways…

When you use Async before or after, you are basically creating a save point with it, that is, persisting the data at that moment before the breakpoint.
This link below explains even better about bank transactions.

A practical example of this: Imagine that you have a service that sends a button in your application that triggers a message (message correlate) to Camunda, and this diverts your flow from Camunda to a cancellation step…
If there is any failure in the service task right after Camunda receives the correlation, as there was no save point before, your instance will go back to the last “save point” stage. So, as a good practice, I always put receiving messages in Camunda as Async After, so we don’t “lose the correlations” that reached Camunda and the incident will be launched in the service task where the problem occurred, without your instance going back to the last “save” point".

In case I wasn’t clear in the explanation… you can question that I’m available to help!

Regards.
William Robert Alves

The Gateways always wait until the task before them completes…
So if you have a REST connector it will still wait until a response is received (in an async model this would likely be just a 200/OK).
The question you didn’t ask (but I’m inferring) is: What do we do when we’re in an async model, and we need to wait for a callback?
Well… That’s what a receive task is for :slight_smile:
It will sit and wait for that callback to come in before moving on to the gateway.

As to the other question… I’ll claim to be mistaken in that scenario. I thought that the conditions only evaluated when the condition-item changed (eg. if there’s a variable for the condition, they only evaluate on variable update). If it always evaluates, then I can’t explain why it would get stopped at the event gateway, unless both conditions failed…

2 Likes

I might be digging a needle in the haystack (Will change to Exclusive Gateway to fix it temporarily) but the question still bugs me in my head-
Camunda Waits at the Service Task
External Worker writes a variable to satisfy the conditional expression for the Conditional event at the Event Gateway
External Task gets completed by complete External Task API
Creates the Event Gateway - At this time assume the validation is fired it should proceed the flow., But this is not always the case - Is it really a timing issue because the whole series of steps are happening in the same second may be!!! (with in ms)

Really Appreciate your responses and thanks to the community.

Hi @GotnOGuts!
It is actually the developer who decides how the event will be evaluated according to the need.

In the modeler, we have a field called “Condition” and “Variable events” where we can put the name of the variable and the type of event that will be “listening” to happen with this variable so that the expression is evaluated.

In my case below, it listens only to the “CREATION” of the variable “notifyImplantadaParticipanteResponse”, and when it is created, it evaluates the expression. If nothing was described, it would evaluate the condition at every creation / update of variables, also impacting performance (imagine 10k of instances being evaluated at all times with each change of variable.)

image

This link below contains a better explanation on the subject, but I already printed the part that talks about it next.

image

Regards.
William Robert Alves

@myarlagadda Hello!

Take a look at my explanation above, maybe this configuration of your conditional event could be impacting this.

Imagine that your worker completed the task, sending the necessary variables to the process… so as soon as the instance is released by “FetchAndLock”, the variables will already be created and can be evaluated right away… even if this process is VERY FAST.

If you want to provide better data such as prints of your already parameterized conditional event, and data such as the variable you want to evaluate, I believe we can help you better.

Count on us!
William Robert Alves

I am thinking based on all your feedback -
External Service Task is Created
Ticket is waiting in External Service Task
An update Variable is sent to Camunda (Still not committed in DB)
Complete External Service Task is fired
When the Complete is fired – Imagine system is trying to do couple of things
Its trying to commit the updates received while waiting in Service Task to DB
It’s trying to create an event Gateway and commit to DB
Now the evaluation happens as per your statement above (at the same time the instance arrives there)
By the time the evaluation is fired – The Commit of the variable is still pending
In this case, the issue occurs – This is completely theoretical for me currently (will try to collect evidence of DB commits from our logs)

There is no possibility that you have some control variable that checks if all the necessary statuses are being met and according to each step, updates to a different status, and when they have the necessary status, update this status in this variable… and in the your conditional event put an “update” as “variable events”?

Imagine that you have an external task that does a “complete()”.
Then you have variables arriving from other places in Camunda.

Then you can do a validation for when all the necessary variables are created, using “hasVariable()” or something similar… then update the value of the variable “conditionalStatus” to “true” for example, and your conditional event will release the passing of the instance.

Remembering that the creation of these control variables, you can do it in your code or even in an execution listner, since you use Camunda 7.

In this setup, I would not use Event Based Gateway, since your Variables should already be updated by the time the decision is being evaluated. You aren’t waiting around looking for the variables to be updated.

However, if you have some sort of inconsistent delay (async model bites us here!), where the “An update Variable is sent to Camunda (Still not committed in DB)” could arrive AFTER the “Complete External Service Task is fired” … well, this could be 1. Causing your OLE 2. Driving your plans for event based gateway.

Ideally, your service worker would collect its own variable updates (rather than sending them directly to Camunda), and including them with the “Complete” response back to Camunda.