There are 12 repositories under reliability topic.
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
A curated list of Site Reliability and Production Engineering resources.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Compilation of public failure/horror stories related to Kubernetes
Chaos Engineering Toolkit & Orchestration for Developers
Hands on labs and code to help you learn, measure, and build using architectural best practices.
A free book about developing secure and robust systems software.
SYSTEM DESIGN IS NOT JUST FOR INTERVIEWS, IT CAN BE HOW YOU MANAGE YOUR LIFE. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨💻 Interview Prep. 👨💻
A curated list of Site Reliability and Production Engineering Tools
A hosted disposable email telegram bot; Extremely privacy friendly; Proudly hosted for community.
An always-on framework that performs end-to-end functional network testing for reachability, latency, and packet loss
PHP HI-REL SOCKET TCP/UDP/ICMP/Serial .高可靠性PHP通信&控制框架SOCKET-TCP/UDP/ICMP/硬件Serial-RS232/RS422/RS485 AND MORE!
Notes on Site Reliability Engineering. Leave a 🌟 if you found this useful!
Chaos and resiliency testing tool for Kubernetes and OpenShift
📚 🐣 软件实践文集。主题不限,思考讨论有趣有料就好,包含如 系统的模型分析/量化分析、开源漫游者指南、软件可靠性设计实践、平台产品的逻辑与执行…… 🥤
Simple, Erlang-inspired fault-tolerance framework for Rust Futures.
PowerShell scripts to ensure consistent and reliable build quality and configuration for your servers
A role-playing game for incident management training
Fast computation of Krippendorff's alpha agreement measure in Python.
A curated list of awesome Site Reliability and Production Engineering resources.
Transactional power-failsafe filesystem for microcontrollers
Guardian of Kubernetes and OpenShift clusters. Tool to monitor clusters health and signal/alert on failures.
Portable, independent, web-based, simple streaming YouTube video queues and playlists for music videos, audiobooks, etc.
A non-interactive daemon for host management
a general library for fatigue and reliability
Reliable distributed agreement service for the cloud
Reliability as Code: SRE automation at the tip of your fingers
OpenCossan is an open and free toolbox for uncertainty quantification and management.
A pub-sub system for the distributed web - my master thesis @ IST
A complete list of all the ways a Client and Server communicate with each other in JavaScript and Node.