There are 17 repositories under reliability topic.
A curated list of Site Reliability and Production Engineering resources.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Compilation of public failure/horror stories related to Kubernetes
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
It's just fascinating. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨💻 Interview Prep. 👨💻
A free book about developing secure and robust systems software.
A curated list of Site Reliability and Production Engineering Tools
Sample implementations for cloud design patterns found in the Azure Architecture Center.
A hosted disposable email telegram bot; Extremely privacy friendly; Proudly hosted for community.
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
An Open-Source Collection of 230+ Flash Cards to Help You Succeed in Your System Design Interview and More 💯
Easily run integration tests for your backends
Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.
📚 🐣 软件实践文集。主题不限,思考讨论有趣有料就好,包含如 系统的模型分析/量化分析、开源漫游者指南、软件可靠性设计实践、平台产品的逻辑与执行… 🥤
Notes on Site Reliability Engineering. Leave a 🌟 if you found this useful!
A role-playing game for incident management training
Simple, Erlang-inspired fault-tolerance framework for Rust Futures.
a general library for fatigue and reliability
Fast computation of Krippendorff's alpha agreement measure in Python.
PowerShell scripts to ensure consistent and reliable build quality and configuration for your servers
Transactional power-failsafe filesystem for microcontrollers
🛡️ A module for improving the reliability and fault-tolerance of your NestJS applications
A curated list of awesome Site Reliability and Production Engineering resources.
A non-interactive daemon for host management
Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.
[AAAI22 Oral] Reliable Propagation-Correction Modulation for Video Object Segmentation