etcd-io / raft

Raft library for maintaining a replicated state machine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extended Raft algorithm with witness support

joshuazh-x opened this issue · comments

The Raft algorithm requires an odd number of servers to maintain a quorum, meaning a minimum of three for a single point of failure. This isn't an issue for large systems but can be challenging for budget-limited customers needing fewer servers 1.

Efforts have been made in both scholarly circles 23 and commercial sectors 456 to resolve this issue. However, all existing research and implementations for small scale clusters (with two servers for example) either depend on another HA solution or necessitate a standalone server as a witness, adding to deployment complexity and potential performance bottlenecks.

I hereby propose extended Raft algorithm, a variant of Raft algorithm, which is designed for clusters with regular servers and a single witness, minimizing data traffic and access to witness while maintaining all key Raft properties. The witness in this algorithm is very suitable for implementation as a storage object with various options such as NFS, SMB, or cloud storage.

The extended Raft algorithm is backward compatible with Raft, meaning any cluster running with Raft can be seamlessly upgraded to support witness.

The correctness of the algorithm has been conclusively proven through a formal proof in https://github.com/joshuazh-x/extended-raft-paper. Besides that, we also validate the formal specification of the algorithm using TLC model checker.

Look forward to your suggestions and feedback.

@pav-kv @serathius @ahrtr @tbg @Elbehery @erikgrinaker @lemmy

Footnotes

  1. https://github.com/etcd-io/etcd/issues/8934#issuecomment-398175955

  2. Pâris, Jehan-François, and Darrell DE Long. "Pirogue, a lighter dynamic version of the Raft distributed consensus algorithm." 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). IEEE, 2015.

  3. Yadav, Ritwik, and Anirban Rahut. "FlexiRaft: Flexible Quorums with Raft." The Conference on Innovative Data Systems Research (CIDR). 2023.

  4. https://github.com/tikv/tikv

  5. https://platform9.com/blog/transforming-the-edge-platform9-introduces-highly-available-2-node-kubernetes-cluster/

  6. https://www.spectrocloud.com/blog/two-node-edge-kubernetes-clusters-for-ha-and-cost-savings

+1 to witness support. My main concern would be creation of a test plan to ensure correctness. I think the TLC model checker will be crucial here.

Yes, indeed. We need tests to ensure implementation correctness besides algorithm itself. In our PoC work on etcd, we managed to make all existing tests (e2e, integration, robust) run on cluster with single witness. But additional tests would be needed to cover witness specific functionalities.