Maybe GlusterFS on Kubernetes? https://archive.fosdem.org/2017/schedule/event/kubegluster/
Maybe not:
GlusterFS is latency dependent. Since self-heal checks are done when establishing the FD and the client connects to all the servers in the volume simultaneously, high latency (mult-zone) replication is not normally advisable.
https://joejulian.name/post/glusterfs-replication-dos-and-donts/
It looks like Azure can only do scheduled backups: https://docs.microsoft.com/en-us/azure/backup/backup-azure-linux-app-consistent
Google’s regional persistent disks provide a/synchronous replication and automatic fail-over: https://cloud.google.com/compute/docs/disks/high-availability-regional-persistent-disk. This could allow you to fail-over to another zone in the same region while maintaining the latest state of the cluster.
AWS EBS doesn’t look to have synchronous replication, it is check-point backups like Azure.
Update: I talked with Sagy Volkov (ScaleIO / Red Hat) about this today at DevConf.cz. He has been running latency/throughput tests at Red Hat for storage on Kubernetes. He says that you can do multi-zone async replication with Rook + Ceph. It has good throughput, but double the latency - and the latency is variable, depending on what is happening in the Cloud provider’s infrastructure and where your workload is actually located. A test with the same configuration, deployed in the same zones two hours apart can have different performance. He put it: “I don’t say that performance is bad, but that it is crazy. You can’t predict it.”
He also said that performance testing should have not only the average throughput / latency, but also focus on the 95th and 99th percentile - especially if your application has an SLA.
Further update: I’m in Jose Morales’ (formerly OpenShift, now VMWare) talk on K8s application development. He pointed me to Portworx as a solution that can do Zero-downtime DR:
Zero RPO DR
PX-DR offers RPO-zero failover across data centers in a metropolitan area in addition to HA within a single data center. Examples include, but are not limited to, Azure US East to AWS US East; Azure Germany Central to AWS Europe Frankfurt; Google Cloud asia-east2 to Azure East Asia; any AWS data center to Direct Connected Colo facility.
Continuous backup across the globe
For DR needs that span a country or globe, PX-DR offers continuous incremental-backups so that you can keep an up-to-date backup of your mission critical apps staged in case disaster strikes.
From this case study:
We are huge fans of open source software and before we found Portworx, we tested almost every free and open source product for running stateful containers and they couldn’t satisfy our high requirements in scalability, resilience, and security. We looked at Rook as a self-hosted Ceph cluster within Kubernetes, GlusterFS, OpenEBS, and Rancher Longhorn.
Open source version px-dev.
If I were going to try it, I would probably start by evaluating / testing this and the Google Regional Persistent disks (if Google Cloud were an option).
You have the same trade-off with performance / consistency that you have in any system. If you force synchronous commits to ensure consistency, you will pay a latency cost. If you asynchronously commit for performance, you have a window of inconsistency.