Updates of Rev_ for process instances in ACT_RU_EXECUTION

Juergen_Krauss · February 20, 2018, 5:06pm

Hello,
i am a student trying to understand the behavior of camunda and the PVM in general. At the moment i’m looking at locking mechanisms on database level.

I recognized that the Rev_ and only the Rev_ of the execution tree root (the tuple of the process instance) in the database table ACT_RU_EXECUTION gets increased whenever a child execution gets updated. During a concurrent execution of two child executions in seperate transactions this implicit update of the root’s Rev_ inevitably leads to optimistic locking exceptions. The child executions have nothing in common apart from the parent and could be seperatly updated.

What is the reason for that implicit update in the first place?

Thank you for the answer
Jürgen

thorben · February 21, 2018, 9:57am

Hi Jürgen,

The reason is synchronization. Think about a joining parallel gateway with two incoming sequence flows. For the process engine, this means two concurrent executions need to be joined and their parent continues on the outgoing sequence flow. Let’s assume we have two transactions each singalling one of the sequence flows. These transactions must be run in a serial fashion, or else it cannot be determined which transaction is second and has to continue with the parent execution. For that purpose, updating the parent (and therefore provoking optimistic locking exceptions) is the way to enforce this serialization.

I do not believe the engine behaves that way. E.g. if you have a concurrent execution that simply goes from one user task to the next, I don’t think the parent gets modified.

Cheers,
Thorben

Juergen_Krauss · March 12, 2018, 6:15pm

Hey Thorben,
thank you for your anwers.
I did not messaged here, because you helped me a lot was eager to continue my work.

I’ve got a last couple of question on that topic and i hope it is not a problem when it is a little bit offtopic.
The question is regarding the asynchronous flag to cut of repetitions.

If have this process model. I want to run Task 1 and Task 2 parallel. So i configured both non exlusive and asynchronous before. Of course, i get optimistic locking exceptions. Now i want to “cut off” the work the repeating transaction has to do. I flag both tasks async after and it works.

Questions now:
What is the different between async after on the tasks and async before on the gateway? I would expect that the async before on the gateway is a transaction boundary for every transaction passing by. At a occuring repetition, because of a optimistic locking exception, the transaction would jump back to that point. Instead, the repetition starts at the async before. Why?

Second question: Where did you hide the optimistic locking exceptions when using the async after on the tasks as described above? They should still occur because the problem of seriazliation is not solved. I assume, that they are just ignored on the console logging. Is there a mechanism that detects that the repetition is just a transition between a activity flagged as async after and the place of execption?

Thank you and greetings
Jürgen

thorben · March 28, 2018, 11:22am

Hi Jürgen,

Sorry for the late response, this somehow escaped my attention.

I think they should behave the same in this case with respect to what gets rolled back under which circumstances.

I’m having trouble following the explanation. Could you maybe rephrase it?

If I remember correctly they are logged in any case. Maybe the jobs are in fact executed sequentially due to timing behavior.

Do you mean a way to detect that no task was executed between transaction start and the time when the exception was raised? If that is so, then I can tell that there is no built-in mechanism.

Cheers,
Thorben

Juergen_Krauss · April 9, 2018, 4:58pm

The explanation was not correct. Sorry. I double checked it and i can confirm that the behavior of the repetition is the same for an async before on the gateway and an async after on both tasks. Therefore, we can forget about my first question.

I am just wondering why the system can distinguish between an repetition between an asynchronous before on the task and the gateway and an asynchronous after on the task and the gateway. Both situations have the same “problem” with optimistic locking. However, while the execution of the first situation will throw optimistic locking exceptions and performs a repetition, the execution of the second situation simply works. I think the solution is awesome because it enables a flexible configuration the user does not get confronted with that exception anymore.
However, if there is no mechanism to detect that the repetiton is just a transition an between the last transaction boundary and the point of failure is nothing to do, why is the system behaviour different?

Thank you and greetings
Jürgen

thorben · April 10, 2018, 6:46am

Not sure I understand this. We know that OptimisticLockingException is raised when two transactions update the same entities in parallel. By setting asynchronous boundaries, we can delegate the responsiblity to deal with the execeptions to the job executor. The job executor handles the situations by performing retries. Then, there is also the exclusive flag. If this flag is set, then the job executor makes sure to run no jobs in parallel that belong to the same process instance. This then ensures that we will not see any instances of OptimisticLockingException. So could it be that you see different behavior not because it is asyncAfter on the task vs asyncBefore on the gateway, but rather because the asyncAfter is configured as not exclusive while the asyncBefore configuration is?

Juergen_Krauss · April 11, 2018, 1:21pm

Hey Thorben,
it has nothing to do with the exclusive flag. In all of my examples i work with non-exclusive tasks because i want to achieve an efficient parallel execution of concurrent paths. Or better: I achieved a efficient parallel execution of concurrent paths with Camunda but i don’t know why.
I will try again to explain what i do not understand.

I have this model (model 1):

Executing this model works as desribed above. At the tasks which are asynchronous before, the engine creates jobs. Afterwards, both paths are executed in parallel by different threads of the job executor. One of the paths can be executed successfully. At the end of the execution the transaction increases the rev_ for the parent execution in the database. This leads to an optimistic locking exception when the second path was executed and the transaction wants to commit. As a consequence, the second path will be repeated.

Now i have this model (model 2):

Once again, the engine creates jobs at the tasks which are asynchronous before. Afterwards, both paths are executed in parallel by different threads of the job executor. The asynchronous after on both tasks lead to the creation of new jobs and more important, to transaction boundaries for both parallel transaction. The first execution arriving at the gateway can be finished successfully. At the end of the execution the transaction increases the rev_ for the parent execution in the database.
For now it is very similar and here is the point where i stop understanding. Afterwards, the second path arrives at the gateway and increases the rev_. In my opinion, this should lead to an optimistic locking exception and a repetition of the transaction starting by the asynchronous after of the second path. However, i see that there is no optimistic locking exception anymore.
First i thought that asynchronous after makes the transitions non-exclusive again. But then i tried this and observed that Task 3 and Task 4 are still executed in parallel (model 3):

At this point i came to this explanation:
Independent from the asynchronous after, the two parallel transactions still use optimistic locking to synchronize at the gateway. The repetition of the last transaction is only a transition and the join mechanism of the gateway. This repetition is to fast to be a significant problem for an user. Additionally, the repetition does not change the process execution like multiple executions of tasks and the user does not need to pay attention to it anymore. Therefore, the optimistic locking exception still occurs but is not shown on the console anymore.
But then again there has to be some kind of mechanism to detect that between the last transaction boundary and the point of the optimistic locking exception is nothing but a transition and that the execption does not need to show up. You already said that there is nothing like that. And at that point i am just clueless about what is happening when i use the asynchronous after in front of gateways…

I hope i could make my thought understandable.

Thank you for your passion
Jürgen

thorben · April 11, 2018, 2:06pm

Hi Jürgen,

Thanks for the clarification with the models, that makes it easier to understand.

While it may appear that it works this way, I can confirm that there is no such logic . Considering the second BPMN diagram you posted, the async-after jobs are potentially still executed in parallel by the job executor. If this leads to an optimistic locking exception, it is still thrown, logged and the corresponding job retried. What I rather think is that due to timing behavior the async-after jobs are in fact executed sequentially.

If you want to introspect the engine behavior a little bit, you can set the following loggers to level DEBUG (on Tomcat FINE) and have a look at what happens when: org.camunda.bpm.engine.jobexecutor, org.camunda.bpm.engine.cmd.

Cheers,
Thorben

Juergen_Krauss · April 11, 2018, 3:37pm

I thought exactly the same. Asynchronous after jobs are exclusive by default. That is the reason why I tried the third model. If asynchronous after would lead to a sequential execution Task 3 and Task 4 would be executed on after each other. That is not the case. Task 3 and Task 4 are still parallel.
However… I never thought about the exclusiveness of the gateway. Task 3 and Task 4 both are non-exclusive. I assume that the defaultfor a node is exclusive if the node is not asynchronous after and asynchronous before. The option to make a node non-exclusive does only appear after one of the asynchronous modes was activated. I tried that model (model 4):

From a perspective of the execution this models makes no sense. I added asynchronous after on the gateway just for the possibility to artifically make it non-exclusive. Suddenly, i have the optimistic locking exceptions as expected!

For now I would say that the execution of nodes behind asynchronous after depends on the exclusiveness on the following node. In case of model 3 it is still parallel because Task 3 and Task 4 are non-exclusive. Model 2 seems to be a sequential execution because the default for the following gateway is exclusive. Finally, model 4 confirms this assumption because the exception reappears after setting the gateway non-exclusive. The execption confirms a parallel execution after the asynchronous after.
Do you think these statements are correct?

Do i have to configure the levels for the logs in the logback.xml or is there a configuration file in wildfly? I looked through the standalon.xml and logging.properties and did not find anything to change the level.

Thanks,
Jürgen

thorben · April 11, 2018, 3:49pm

Now this surprises me

I don’t think so. The way exclusive works is like so:

Process execution reaches an asynchronous continuation
The process engine inserts a new job into ACT_RU_JOB. That entry has a field EXCLUSIVE_. This field is set according to how the flag is configured on the activity at which the job is created.
The job executor polls the table ACT_RU_JOB for jobs. If there are jobs where EXCLUSIVE_ is true, it ensures that no other job of the same process instance is executed in parallel

So unless there is a bug in step 2 (e.g. wrong activity is considered), then your explanation should not apply.

Let’s first have a look at the logs and interpret them before we post a new theory. Otherwise I fear a bit that we may lose track.

I assume you use a shared process engine on Wildfly (e.g. the downloadable Camunda distribution for Wildfly). Is that correct?

Cheers,
Thorben

Juergen_Krauss · April 12, 2018, 10:51am

Good idea!

Yes. At the moment i am using the Camunda 7.8.0 with Wildfly 8.2.1.Final distribution.

thorben · May 8, 2018, 3:21pm

Hi Jürgen,

Sorry for not getting back sooner. On Wildfly, you can configure logging in standalone.xml, see https://github.com/camunda/camunda-bpm-platform/blob/7.8.0/distro/wildfly/assembly/src/wildfly/standalone.xml#L80-L126.

Cheers,
Thorben