Process failover Camunda

I have JSON message passed to camunda Workflow In the process executing the workflow, If a failover happens to that camunda instance at that point.

How does Camunda immediately pickup the failover messages.

Please let me know if there is any ( Kind of Database table which maintains the failover and trigger back the failed messages)

Thanks ,

To answer your question, let me assume that you have an engine node cluster connected to a common database. I will assume we re talking about fail-over across the engine nodes rather than the database tier (let me know if that’s not the case). In addition, I shall assume your process model has a receive task which processes the message.

A key concept to understand is that the engine nodes are essentially stateless. All state is persisted in the database. Hence the state of the database is the source of truth for an engine node.

Ok lets assume that the message arrives and your load balancer routes the message to one of the engine nodes. Essentially you want to process the message and acknowledge receipt of the message. If the message processing logic results in a process instance variable, then the varible will be persisted in the database.

You now need to make a design decision as to how aggressively the variable is persisted in the database. If you mark the receive task as asynchronous after, then the process state and variable will be persisted to the database as soon as the receive task is complete. Hence if this node now fails, all is fine for subsequent process logic as the database has captured the state.

If you don’t mark the receive task as asynchronous after and you perform a lot more business logic in the context of the calling thread, then you increase the probability of the engine node failing whilst you re processing the message. Hence if the node fails before the next checkpoint and thus database flush, the state of the process in the database will reflect the state of the process before the message arrived. Thus the process instance will effectively ‘roll back’ to the state before the message arrived. In this case you may need to resend the message.

Now comes the tricky part, message delivery and distributed systems. In my experience you really should design them for at least once message delivery as it is extremely difficult to guarantee at most once, reliable message delivery.



1 Like