Active-Active Cluster methodology using postgreSQL

Hi all,
PostgreSQL has a number of replication solutions named here: PostgreSQL different replication solutions. We need to choose one if we use PostgreSQL. For active active set-up we want to be able to reads/write both on two databases (cluster of two nodes), and we do not want to lose data. At the same time we want Camunda to work without any problems. What is the suggested approach? Any lessons learned from Production Installations?

Thanks.

Hi,
There is a lot of subtlety in this area, I could write an essay, so I will talk mor briefly to a few points…

You talk to a solution, active/active DB tier, however, what is your requirement? Are you wanting to (1) maximise throughput or (2) maximise availability or (3) minimise data loss? These are not mutually exclusive, however its useful to prioritise your requirement because you have to give something up…

If its 1, a good approach is vertically scale the DB tier until you are forced to shard…

If its 3, you need to be clear on which data. There is data representing the state of the process maintained by the engine which may be different to the state of the business data the process operates on. The process engine will transition from one good state to another good state. This is why the engine requires read committed transaction semantics at the DB tier. If your process updates business data outside of the engine’s DB transaction, eg via an API call, then there is a risk the business data and process state can get out of sync. Thus on DB failure, the engine will recover to a good state, however in flight transitions could be lost. If your business data updates are idempotent, then the system will eventually recover consistency.

If its 2, then a single master with fail over to a block level replicated slave is probably the easiest. If you can use something like AWS RDS, then this is taken care of for you. You will still have an outage on failure, however the failover can be automated and transparent and in the order of 60 seconds. Hence often this is transparent to users…

Thus a few points to think and be claer about…

regards

Rob