Maven Module: docker-network-failure-simulator
Docker Hub: docker-network-failure-simulator
The aim of this project is to demonstrate how to simulate different types of network failure using docker. There is many tools that provide abstraction over basic linux tools like iptables and tc:
- Chaos Monkey https://github.com/Netflix/chaosmonkey
- Pumba (https://github.com/alexei-led/pumba)
- Comcast (https://github.com/tylertreat/Comcast)
- Blockade (https://github.com/worstcase/blockade)
- Gremlins (https://github.com/qualimente/gremlins)
The problem with those tools is that they are an abstraction and you don't really know how things work under the hood.
I believe in the power of simple one-liners, demonstrating the concept of failure injection.
Traditional unit/integration/functional/load testing is performed under ideal environmental conditions.
As the micro-services architecture gains more popularity, it is crucial to test including host, network and application software failures.
Examples:
- Network delay (e.g. caused by peaks in activity)
- Dropped packets
- Corrupted packets
- GC pauses
- CPU saturation
- Virtual machine reboots
...
Container technology and Docker, opened the way to setup the whole environment (docker-compose) and manipulate it's components in a fully controlled and repeatable way.
Fault-injection process can be codified and resource-efficient. We don’t need to overload a database to see if the application uses timeouts or backpressure properly.
In our case it's a simple ping server that will show when network delay is added/removed.
docker run -it --name test-server --cap-add NET_ADMIN --rm alpine sh -c "apk add --update iproute2 && ping www.wp.pl"
Let's decipher some of the options
- docker run -it - starts new docker container in interactive mode
- --name test-server - defines name for newly started container
- --cap-add NET_ADMIN - new container can manage network, important, it's needed by traffic shaping tool
- --rm - delete container on exit
- alpine - it's the name of a lightweight linux container
- sh -c "apk add --update iproute2 && ping www.wp.pl" - this is the command that will be executed when the container starts up. It installs iproute2 tools and runs ping command.
Here is expected output:
D:\dev\git.repo\docker-network-failure-simulations>docker run -it --name test-server --cap-add NET_ADMIN --rm alpine sh -c "apk add --update iproute2 && ping www.wp.pl"
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/5) Installing libelf (0.8.13-r2)
(2/5) Installing libmnl (1.0.4-r0)
(3/5) Installing libnftnl-libs (1.0.7-r0)
(4/5) Installing iptables (1.6.0-r0)
(5/5) Installing iproute2 (4.7.0-r0)
Executing iproute2-4.7.0-r0.post-install
Executing busybox-1.25.1-r0.trigger
OK: 7 MiB in 16 packages
PING www.wp.pl (212.77.98.9): 56 data bytes
64 bytes from 212.77.98.9: seq=0 ttl=37 time=0.762 ms
64 bytes from 212.77.98.9: seq=1 ttl=37 time=0.710 ms
64 bytes from 212.77.98.9: seq=2 ttl=37 time=0.651 ms
...
docker exec -it test-server sh -c "tc qdisc add dev eth0 root netem delay 100ms"
Here is the test-server console output
64 bytes from 212.77.98.9: seq=7 ttl=37 time=0.638 ms
64 bytes from 212.77.98.9: seq=8 ttl=37 time=0.683 ms
64 bytes from 212.77.98.9: seq=9 ttl=37 time=0.825 ms
64 bytes from 212.77.98.9: seq=10 ttl=37 time=0.546 ms
64 bytes from 212.77.98.9: seq=11 ttl=37 time=100.503 ms <-- Delay is added
64 bytes from 212.77.98.9: seq=12 ttl=37 time=100.531 ms
64 bytes from 212.77.98.9: seq=13 ttl=37 time=100.666 ms
64 bytes from 212.77.98.9: seq=14 ttl=37 time=100.896 ms
docker exec -it test-server sh -c "tc qdisc change dev eth0 root netem delay 0ms"
Here is the test-server console output after setting delay back to 0ms
64 bytes from 212.77.98.9: seq=8 ttl=37 time=0.683 ms
64 bytes from 212.77.98.9: seq=9 ttl=37 time=0.825 ms
64 bytes from 212.77.98.9: seq=10 ttl=37 time=0.546 ms
64 bytes from 212.77.98.9: seq=11 ttl=37 time=100.503 ms
64 bytes from 212.77.98.9: seq=12 ttl=37 time=100.531 ms
64 bytes from 212.77.98.9: seq=13 ttl=37 time=100.666 ms
64 bytes from 212.77.98.9: seq=14 ttl=37 time=100.896 ms
64 bytes from 212.77.98.9: seq=15 ttl=37 time=100.675 ms
64 bytes from 212.77.98.9: seq=19 ttl=37 time=100.699 ms
64 bytes from 212.77.98.9: seq=20 ttl=37 time=100.523 ms
64 bytes from 212.77.98.9: seq=21 ttl=37 time=100.494 ms
64 bytes from 212.77.98.9: seq=22 ttl=37 time=0.661 ms <-- Delay is removed
64 bytes from 212.77.98.9: seq=23 ttl=37 time=0.665 ms
64 bytes from 212.77.98.9: seq=24 ttl=37 time=0.652 ms
Reference: https://github.com/spotify/docker-client
The above class demonstrates how the same delay can be applied using java. This may be useful if you want to start your environment with docker-compose and then apply different types of network failures to test resilience of your micro-services.