There are 13 repositories under reliability-engineering topic.
A curated list of Site Reliability and Production Engineering resources.
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
A checklist of anyone practicing Site Reliability Engineering
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
A curated list of Site Reliability and Production Engineering Tools
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
A curated list of awesome Site Reliability and Production Engineering resources.
The Chaos Toolkit core library
A terraform provider for Concourse
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
A collection templates ported from the SRE Workbook
No longer maintained: Puppet module for aptly
Code for the paper "Deep Cox Mixtures for Survival Regression", Machine Learning for Healthcare Conference 2021
Terraform provider for Nobl9
Terraform configuration to manage a Prometheus server running on AWS.
A Go application for generating billing data from cloudfoundry events
Administration tool for GOV.UK PaaS
A service broker to provide Aiven Elasticsearch and InfluxDB services to Cloud Foundry users
Related resources for incident failure diagnosis research.
Sample applications of supported integrations by Last9 Products
:bookmark: Daily-updated reading list for designing High Scalability :cherries:, High Availability :fire:, High Stability :mount_fuji: back-end systems - Pull requests are greatly welcome :two_men_holding_hands: I hope you will find this project helpful :four_leaf_clover: Please help me share it to more and more people :heart: Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! :bow:
Bootstrap a VPC with BOSH and Concourse to run PaaS
Technical documentation for GOV.UK PaaS
Documentation for Reliability Engineering services