Resiliency

We are running Camunda using greenfield stack, i.e. Camunda process engines embedded within spring boot containers and in a heterogeneous setup backed by Postgresql.

The microservices are running in HA mode; e.g. Multiple containers running same microservice.

In a setup like this, what happens if a microservice container dies while executing a process? Does the process gets executed by any of other process-engine(s) present in other microservice instances automatically?

@dbasu each wait state/async point in your process will create a job in camunda database (act_ru_job table). There you can find columns with the information about which worker (in job executor) picked this job and the lock expiration…
you can adjust the default expiration time of jobs in your properties or yml file. If your expiration time is say 5minutes, everytime a job is picked by a job executor, it will have 5minutes until it becomes free to another job executor take it too. If one of your pods take a job and dies without complete it, after the expiration time ends, another pod wilk take and execute it again.

Thats why its very important to make your service tasks idempotent as possible. If you are calling some external services from your delegates, its very common to have this service called again multiple times because one of your pods died without complete the job they already started, because of an outofmemory or maybe just one of your cluster nodes died for some reason.

Look at the lock-time-in-millis property here, please:

2 Likes

Thanks @Jean_Robert_Alves .

Is there a way for me to configure my lock-in time separately for different process instances? Reason being, since we are running Camunda using greenfield stack, we are defining different processes and hosting the processes as HA in the setup and they all share the Same persistent store for WF related data. Now individual processes can have their own NFR to acquire/process a Task within the process.

It would be beneficial to know if such a configuration can be done at process level instead of Job-executor level, which I believe will impact all the process engines of individual process?

Kindly seek advice from the experts in the forum.

@Niall @Ingo_Richtsmeier @StephenOTT – Apologies; As an act of desperation, I might be crossing the forum rules by tagging specific people; But really looking for a kind advice on this soon.

Hi.

perhaps the external task pattern may meet your requirement? With external tasks you can independently scale and set the lock duration at the granularity of a task…

regards

Rob