Thinking about the problems caused by process engine restart

as8457632 · March 25, 2020, 8:20am

Hello，If I need to restart the process engine (because I may need to update part of the content), but there is a running service Task executing a rest request job, the content of the job can be a purchase order, and the order placing operation is successful, but when the rest request returns, the process engine is restarted, which will cause the order to be placed successfully, but the process engine rolls back. The user sees the order, but the process can’t go any further. I want to do something before restarting, for example, whether the query is running service Task, or is there a better way to restart the engine if not? Thank you

Webcyberrob · March 25, 2020, 10:50am

Hi,
I think it depends on how the engine is restarted…If its an orderly shutdown, normally systems suspend new requests, wait for current to complete, then shutdown in a consistent state. If the shutdown is a crash, then things may be left in an inconsistent state, either with respect to the client or the server. In the case of Camunda, given state is flushed to a DB in a transaction, it will usually fail to a consistent state, however the state of inflight tasks may be indeterminate, therefore the engine will typically result in retry from the last consistent checkpoint…

regards

Rob

as8457632 · March 25, 2020, 11:23am

If its an orderly shutdown, normally systems suspend new requests, wait for current to complete, then shutdown in a consistent state
Thank you for your quick reply. I would like to know how to wait for the current completion. (my understanding of this completion is that the current transaction has been refreshed to DB),. Can rest api do this? thank you.

Webcyberrob · March 26, 2020, 12:06am

Hi,
With regard to a rest API, do you mean you have an external client interacting with the engines rest API, or are you referring to a service task in the engine making a call to an external rest API?

Either way, in general you need to think about three failure modes, lost request, lost response and lost error response. Idempotent and eventual consistency are your friends here. The engine will typically shutdown or fail to a consistent state internally, however inflight, incomplete actions are similar to the failure modes above, the engine just does not know if it did or didn’t complete…This the engine will rollback to the last known good state, which may result in duplicate task execution…

Regards
Rob

as8457632 · March 26, 2020, 2:15am

Thank you for your quick reply.
Idempotence and ultimate consistency are indeed the best way, but unfortunately, we can’t do that.
I hope to solve the situation of shutting down the process engine normally.If from the start of the process to the service task，The data at this time is not refreshed to the database，How can I find out that there is a task in the current process cache that is not refreshed to the database?
If this happens, I will prevent the process from restarting

Webcyberrob · March 26, 2020, 7:18am

Hi, in an orderly shutdown, the engine should wait a reasonable time for inflight tasks to complete…but you do have a race condition so it’s possible to exceed the wait timeout. There are two potential cases here, tasks run in a client thread, versus a job executor thread. Given that the job executor locks a job prior to execution, you may be able to infer that it was a crash stop, but that’s getting deep into internals…

I think the problem you are dealing with is you need a distributed transaction, so you may want to pursue patterns to effect this…

Rob

as8457632 · March 26, 2020, 8:58am

Thanks @Webcyberrob for the reply，It also provides two solutions: idempotent and distributed transaction.But I hope to find a simpler solution.Because I think most people need to restart the process engine.

Webcyberrob · March 26, 2020, 10:24am

Hi another concept you could look at is a listener, either task or execution. This thread may be of interest…

So you could effectively perform fine grained logging on your own via listeners… However even with this approach, you can probably only determine if something is in doubt as opposed to a definitive answer…

Im still not 100% sure what problem you are trying to solve…

Rob

Im still not 100

as8457632 · March 26, 2020, 1:22pm

When I want to stop the process engine, I want to do the following before I stop:

Wait for the executing task node to complete its operation
Suspend the process to prevent execution
The next time you start the engine, automatically continue the previous suspended process

Ingo_Richtsmeier · March 26, 2020, 1:31pm

Hi @as8457632,

if you stop the engine, this code will be executed: https://github.com/camunda/camunda-bpm-platform/blob/master/engine/src/main/java/org/camunda/bpm/engine/impl/ProcessEngineImpl.java#L156-L174.

It’s easier to inspect the code in an IDE than in Github, and you can follow the methods invoked during close().

Hope this helps, Ingo

as8457632 · March 26, 2020, 1:44pm

Thank @Ingo_Richtsmeier for the quick reply.
This stop engine code cannot wait for the executing task node to complete.

Ingo_Richtsmeier · March 27, 2020, 3:57pm

Hi @as8457632,

well, it depends…

If you use Java Delegates or Connectors to invoke your services, then the engine will wait untill the session is closed (and the tasks are completed).

If you use external tasks, then the worker could not send the complete successfully if the engine is stopped. But after restarting, the state of the external task is still locked (depends on the lock time) and your worker is responsible to retry. This is the situtation where idempotency helps a lot.

How do you invoke your services?

Cheers, Ingo

as8457632 · March 30, 2020, 6:14am

Thank @Ingo_Richtsmeier
I hope as you said
If you use Java Delegates or Connectors to invoke your services, then the engine will wait untill the session is closed (and the tasks are completed).
But after my testing, I found that the close method can be closed at any time without waiting for the session to complete.
I don’t know if I did something wrong, which caused me to have such test results.
I found a class, JobExecutorHelper-> waitForJobExecutorToProcessAllJobs method can solve part of my problem.

huichunheo · January 4, 2024, 3:56am

Hi @Ingo_Richtsmeier,
may I know any suggesstion for @as8457632 reply?
I also have a simple code to test:

@Override
public void execute(DelegateExecution execution) throws Exception {
    log.info("start");
    Thread.sleep(10000);
    log.info("end");
}

when I shutdown my springboot application, it shutdown immediately and haven’t wait until the task completed.

Thanks,
Ken

Ingo_Richtsmeier · January 4, 2024, 12:03pm

Hi @huichunheo,

out of curiosity, I gave it a try and it worked as expected. The JVM throws an exception about the interrupted thread sleep:

Caused by: java.lang.InterruptedException: sleep interrupted
	at java.base/java.lang.Thread.sleep(Native Method) ~[na:na]
	at com.camunda.consulting.simple_spring_boot_process_app.LongWaitingDelegate.execute(LongWaitingDelegate.java:17) ~[classes/:na]
	at org.camunda.bpm.engine.impl.bpmn.delegate.JavaDelegateInvocation.invoke(JavaDelegateInvocation.java:40) ~[camunda-engine-7.20.0-ee.jar:7.20.0-ee]

I don’t know if this is a good example for a long-running delegate, as a sleeping thread can be interrupted, and this exception gets propagated.

Hope this helps, Ingo