djmdata / awesome-scalability-toolbox

My opinionated list of products and tools used for high-scalability projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Architecture diagrams, API documentation

Lucidchart
Mermaid (diagrams from text)
Cloudcraft (AWS optimized)
Swagger
Cacoo
Creately
Draw
Wording, definition syntax and units for RFC specification creation

Message queues

Kafka
RabbitMQ
ActiveMQ
ZeroMQ
nanomsg
phxqueue (from Tencent)
Disque (antirez) (Would be part of Redis 4.2)
HornetQ
IronMQ (cloud)

Load balancers, reverse proxy, accelerators, web servers

Varnish
HAProxy
nginx, nginx config
OpenResty
Tomcat
Træfik
Tarantool (mail.ru)
lightttpd
katran (bpf based L4 load balancer, Facebook)

Service mesh

Envoy L3/4/7 proxy (Lyft/Google, C++)
Envoy introduction
Learn Envoy
Rotor (xDS, Turbine Labs)
Envoy Java control plane
Istio service mesh controller
Istio introduction
linkerd L5 proxy (Finagle based, JVM)
linkerd introduction
Conduit (Rust, linkerd devs)

Structured and unstructured data storage

PostgreSQL
Postgres Pro (PostgreSQL)
JSON in Postgre 10.x, 11.x, PostgreSQL 9.6 vs Mongo 3.4
Why Uber Engineering Switched from Postgres to MySQL and Follow up 1, 2, 3, 4, 5, 6, 7
Redis
MySQL
RocksDB (InnoDB replacement by Facebook)
Vitess (MySQL auto horizontal scaling)
MariaDB (MySQL)
Percona (MySQL)
MongoDB
Scylla (Cassandra on steroids)
Cassandra
CockroachDB
Aerospike
OrientDB (graph)
Database isolation levels
The Log-Structured Merge-Tree (LSM-Tree) whitepaper
B+ tree

Distributed consensus management, service discovery and configuration

Raft protocol
Paxos protocol
Paxos made simple
Paxos Made Live - An Engineering Perspective
Consul
etcd
Vault
Secure Production Identity Framework For Everyone (SPIFFE)
ZooKeeper

Containers

Docker
Awesome Docker list
Kubernetes
Container Network Interface
Mesosphere
Mesos
gVisor (sandbox runtime)
cAdvisor (container monitoring)
Weave Scope (monitoring)
SysDig (monitoring)

Jsonnet

jsonnet
jsonnet builds
Visual Studio Code plugin
Style guide (Databricks)

Kubernetes

Kompose (Docker Compose to k8s)
ksonnet
kubecfg
Kubespray (cluster setup)
Kubeadm (cluster setup)
kops (cluster setup)
kubectx & kubens (switch clusters and namespaces
kube-ps1 (prompt info)
stern (pod and container tailing)
click (cli controller)
prow (Github hooks)
KubeGPU (Microsoft)
Skaffold
K8s management tools
Telepresence (fast dev environments)
Squash microservice debugger
AWS VPC Kubernetes CNI driver using IPvlan
Helm (package manager)
Kedge (Simple, Concise & Declarative Kubernetes Applications)
Contour (Ingress controller using Envoy)
Gimbal (Ingress load balancer to many clusters)
Cilium
Calico
Vault with Kubernetes and Video on improvements
Gitkube
Guide to Kubernetes networking (part 1), Part 2
Kubernetes Security - Best Practice Guide
Weave Scope
Kubernetic (desktop UI client)
50 Useful Kubernetes Tools

RPC, Communication between system nodes

gRPC
Protocol Buffers
Thrift
Cap'n Proto
MessagePack
FlatBuffers
Motan
Aeron
ZeroMQ
SMF
QUIC

gRPC

gRPC status codes
gRPC 2 years in production
gRPC-Web client

Service monitoring, metrics collection / graphing

Grafana
Grafonnet-lib (generate dashboards for Grafana)
Graphite
Prometheus
Node Exporter - machine metrics (Prometheus)
ClichHouse (Yandex)
Druid (Imply)
Pinot (Linkedin)
Architecture analysis of ClickHouse, Druid and Pinot
HTTP Analytics for 6M requests per second using ClickHouse
NetData
Vector (on-host monitoring)
okmeter
Datadog
TimescaleDB
KairosDB
Zabbix
PagerDuty
NewRelic

Infrastructure information management

Osquery (Facebook)
Kolide Fleet (osquery)
Doorman (osquery)
OSSEC

Distributed request tracing

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (Google)
OpenTracing standard
OpenTracing and Jaeger introduction
OpenCensus (Google, tracing and stats)
TraceContext propagation format
Jaeger (Uber)
Zipkin
Lightstep
Skywalking

Load testing

Yandex.Tank (C++, Python, Go)
Overload (storage for Yandex.Tank results)
Gatling (Scala)
Locust (Python)
Vegeta (HTTP 1.1/2)
h2load (HTTP 1.1/2)
Selenium (Web UI)
Selenide (Web UI)

Log management

What you need to know about real-time logs
fluentd
Kafka
Logstash
Graylog2
syslog-ng
rsyslog
Splunk
GoAccess
Bookkeeper
LogDevice (Facebook)
Online solutions:
Loggly
Logentries
Papertrail
Scalyr
Sumo Logic
Humio

Feature Flags

Overview site
FF4J
Togglz (Java)
Unleash (simple)
LaunchDarkly (cloud provider)

Deployment tools

Ansible
Salt
Puppet
Chef
Teletraan

CI (Continuous Integration)

TeamCity
Jenkins
Jenkins X (for k8s apps)
Concourse

CDNs

Akamai
Fastly
Level3
Edgecast
Traffic Control (Self-hosted CDN)

AWS

awscli
awless
S3 Browser
CloudBerry S3 Explorer
Analyze AWS S3 and CloudFront logs + GoAccess

Networking

BPF and friends (Brendan Gregg)
XDP
BPFd (remote BPF by Google)
BCC (Tools for BPF-based Linux IO analysis, networking, monitoring, and more)
How to achieve low latency with 10Gbps Ethernet (Cloudflare)
BBR, the new kid on the TCP block
Making Linux TCP Fast
SYN packet handling in the wild (Cloudflare)
How TCP backlog works in Linux
Understanding TCP close states
Bind before connect
SYNC Cookies
On SO_REUSEADDR and SO_REUSEPORT
Monitoring and Tuning the Linux Networking Stack: Receiving Data
Monitoring and Tuning the Linux Networking Stack: Sending Data
MIT's TCP ex Machina: Computer-Generated Congestion Control
Introduction to modern network load balancing and proxying (Envoy)
BGP in 2017
CoreDNS
Knot DNS
Knot Resolver
Maglev: A Fast and Reliable Software Network Load Balancer
MaxMind GeoIP databases
IPVS
Open vSwitch
kTLS in Linux (TLS in kernel space 4.13+), white paper and Intro in Go
DPDK
FD.io
RIPE NCC network information
JLS2009: Generic receive offload
High-Speed Trading: Lines, Radios, and Cables – Oh My
Solving problem with Nagle's algorithm and delayed ACK using TCP_NODELAY
IPFS
S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing
Linux AnyIP
Listen on all ports for AnyIP range on the server
TCP Tracepoints (Linux 4.15/6+)
Kernel Connection Multiplexor (KCM) and more details
Blocking-resistant communication through domain fronting

SDN

Stratum
p4 language
p4 Runtime
OpenFlow
SAI (Switch Abstraction Interface)
ONOS
OpenNFP
OpenConfig

SRE (Site Reliability Engineering)

Google Site Reliability Engineering book
High Performance Browser Networking book
The Docker Book
Linux Performance tools and materials
U2F devices review
A self-service CA for OpenSSH
Optimizing web servers for high throughput and low latency (Dropbox)
Shipilev Close Encounters of The Java Memory Model Kind
Shipilev JVM Anatomy Park
On disk IO - part 1, part 2, part 3, part 4, part 5
Transparent Hugepages: measuring the performance impact
Introduction 2016 NUMA Deep Dive Series
Understanding PCIe Configuration for Maximum Performance
Netflix Serving 100 Gbps from an Open Connect Appliance
Aphyr Hermitage - info and testing of database isolation levels
A collection of postmortems
Jeff Dean's latency numbers plotted over time
Sakila test DB
Monitoring in the time of Cloud Native
Tyler McMullen - Load Balancing is Impossible
What every programmer should know about memory
What every programmer should know about floating point, floating points format explained, Floating point GUI site, shorter explanation
Chaos Engineering information map
A Gentle Introduction to Erasure Codes
The PMCs of EC2: Measuring IPC
AWS EC2 Virtualization evolution
DNS zone visualization
How Netflix Tunes EC2
Write-Behind Logging
Cache-Oblivious Algorithms and Data Structures
Oracle Graal (Hotspot replacement)
Understanding How Graal Works - a Java JIT Compiler Written in Java
Understanding disk usage in Linux
On time and UTC

TLS/SSL

Sonar
TLS information
Mutuals TLS (mTLS)
Mozilla server side TLS information
testssl.sh
Mozilla Observatory
HTTP security headers testing
Qualys SSL tests
High-Tech Bridge SSL test
HTTP security tools
HSTS preloading
SRI hash generator
Client side TLS test
DNS CAA helper
DNS over TLS

Authorization

The OAuth 2.0 Authorization Framework
JSON Web Token (JWT)
JSON Web Signature (JWS)
JSON Web Encryption (JWE)
JWT playground

Encryption, hashing

OpenSSL
BoringSSL (Google)
s2n (AWS)
LibreSSL (OpenBSD OpenSSL fork)
Cryptography Engineering: Design Principles and Practical Applications (book)
Introduction to Modern Cryptography, Second Edition (book)
Security Engineering, 2nd edition (book)
Crypto 101 (concepts, book)
Applied Cryptography Engineering
Ensuring Randomness with Linux's Random Number Generator
Should we MAC-then-encrypt or encrypt-then-MAC?
Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm
How to choose an Authenticated Encryption mode
Awesome cryptography repository
Mind Your Keys? A Security Evaluation of Java Keystores
Hash-based message authentication code
Authenticated Encryption with Associated Data (AEAD)
AES-GCM (AEAD)
AES-GCM-SIV
GCM blockcipher mode
OCB blockcipher mode
ChaCha20 design (stream)
Poly1305 (MAC)
ChaCha20 and Poly1305 (AEAD)
Understanding RSA terms
Elliptic curve introduction
Elliptic Curve Cryptography: a gentle introduction
Safe elliptic curvers
Curve25519
Fast Positive Hash
HighwayHash and SipHash (Google)
SipHash (original)
Blade2 (crypto)
xxHash
MurmurHash3
Dieharder: A Random Number Test Suite

Videos

Kafka 2017 Summit
CppCon 2017
@Scale 2017
Strange Loop 2017
FOSDEM 2018
Computer Architecture course taught at ETH Zürich in Fall 2017
GrafanaCon 2018
SREcon 2018
KubeCon + CloudNativeCon 2018

Tools

htop
gtop
k6 (load testing)

Misc

High Scalability/Availability/Stability articles list
Another github repo

About

My opinionated list of products and tools used for high-scalability projects