There are 19 repositories under fault-tolerance topic.
These are the best resources for System Design on the Internet
Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
Highly-available Distributed Fault-tolerant Runtime
Service Discovery and Governance Platform for Microservice and Distributed Architecture
A list of papers about distributed consensus.
List of Elixir books
Asynchronous & Fault-tolerant PHP Framework for Distributed Applications.
A library for replicating your python class between multiple servers, based on raft protocol
Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET
**No Longer Maintained** Official RAMCloud repo
Notes on Lindsey Kuper's lectures on Distributed Systems
Python Actor concurrency library
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
Must-read Papers for File System (FS)
A daemon, running in background on a Linux router or firewall, monitoring the state of multiple internet uplinks/providers and changing the routing accordingly. LAN/DMZ internet traffic is load balanced between the uplinks.
Implementation of RAFT distributed consensus algorithm among TCP Peers on .NET / .NETStandard / .NETCore / dotnet
ZIO-native utilities for making resilient distributed systems
Lightweight Java SDK used as Proxyless Service Governance
Polly.Contrib.WaitAndRetry is an extension library for Polly containing helper methods for a variety of wait-and-retry strategies.
Lightweight Go SDK used as Proxyless Service Governance