exajobs / sre-collection

An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Site Reliability Engineering Collection

An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE).

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

sre

Table of Contents

⬆ back to top

Culture

⬆ back to top

Education

⬆ back to top

Books

Back to top

Hiring

Back to top

Reliability

Back to top

Monitoring & Observability & Alerting

Back to top

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

  • Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  • High Scalability - Technical Blog Posts About Systems Architecture.
  • rachelbythebay - Techincal Blog Posts.
  • Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
  • SysAdvent - One article for each day of December, ending on the 25th article.
  • Stephen Thorne's Blog - Blog Posts About SRE
  • Increment - A digital magazine about how teams build and operate software systems at scale.
  • GopherSRE - Blog Posts about Go and SRE.
  • Cindy Sridharan - Blog posts about distributed systems and their management.
  • Blameless Blog - Blog posts about SRE culture and practices.
  • Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
  • FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
  • Rootly Blog - Incident management best practices and guides.
  • incident.io Blog - Guides, advice and resources on incident management and response.
  • Logit.io Blog - Resources on log management, SRE and devOps.

Newsletters

  • DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • SRE Weekly - Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
  • ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

Twitter

SRE Tools

⬆ back to top

Contributing

You are most welcome to contribute to this Awesome Community list as well. Big thanks to all current contributors who have helped build this Awesome Community list.

License

CC0

To the extent possible under law, Exajobs has waived all copyright and related or neighboring rights to this work.

Back to top

About

An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)

License:Apache License 2.0