bernaps / kubernetes-chaos

Chaos Testing in Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chaos Testing in Kubernetes

This repo shows how to do Chaos Testing with applications deployed in Kubernetes.

Create a Kubernetes Cluster

  1. Install kops

    brew update && brew install kops
  2. Create an S3 bucket and setup KOPS_STATE_STORE:

    aws s3 mb s3://kubernetes-aws-io
    export KOPS_STATE_STORE=s3://kubernetes-aws-io
  3. Define an envinronment variable for Availability Zones for the cluster:

    export AWS_AVAILABILITY_ZONES="$(aws ec2 describe-availability-zones --query 'AvailabilityZones[].ZoneName' --output text | awk -v OFS="," '$1=$1')"
  4. Create a 3 master and 6 worker cluster:

    kops create cluster \
      --name=chaos.k8s.local \
      --master-count=3 \
      --node-count=6 \
      --zones=$AWS_AVAILABILITY_ZONES \
      --yes
  5. Verify the cluster and check that the masters are spread across multiple AZs.

    $ kops validate cluster
    Using cluster from kubectl context: chaos.k8s.local
    
    Validating cluster chaos.k8s.local
    
    INSTANCE GROUPS
    NAME			ROLE	MACHINETYPE	MIN	MAX	SUBNETS
    master-us-east-1a	Master	m3.medium	1	1	us-east-1a
    master-us-east-1b	Master	m3.medium	1	1	us-east-1b
    master-us-east-1c	Master	c4.large	1	1	us-east-1c
    nodes			Node	t2.medium	6	6	us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f
    
    NODE STATUS
    NAME				ROLE	READY
    ip-172-20-102-194.ec2.internal	master	True
    ip-172-20-110-113.ec2.internal	node	True
    ip-172-20-148-247.ec2.internal	node	True
    ip-172-20-182-151.ec2.internal	node	True
    ip-172-20-216-171.ec2.internal	node	True
    ip-172-20-42-110.ec2.internal	node	True
    ip-172-20-54-242.ec2.internal	master	True
    ip-172-20-78-200.ec2.internal	master	True
    ip-172-20-82-162.ec2.internal	node	True
    
    Your cluster chaos.k8s.local is ready
  6. Check that the nodes are spread across multiple AZs:

    aws ec2 describe-instances --query 'Reservations[].Instances[].Placement.AvailabilityZone'

Deploy the Application

Note
Make sure to install the specified version of Helm. Helm 2.9.0 is not able install charts. helm/helm#4001 has more details.
  1. Install the Helm CLI: brew install kubernetes-helm

  2. Switch to the version 2.8.1: brew switch kubernetes-helm 2.8.1

  3. Install Helm in Kubernetes cluster: helm init

  4. Create Service Account and provide permissions to deploy the application:

    kubectl create serviceaccount --namespace kube-system tiller
      serviceaccount "tiller" created
    
    kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
      clusterrolebinding "tiller-cluster-rule" created
    
    kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
      deployment "tiller-deploy" patched
  5. Install the Helm chart: helm install --name myapp charts/myapp

  6. Access the application:

    curl http://$(kubectl get svc/myapp-webapp -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
  7. Later, delete the Helm chart: helm delete --purge myapp

Chaos Toolkit

Install Chaos Tooklkit

brew install python3
pip3 install chaostoolkit
chaos --version

Experiment: Kill a Dependent Service (Greeting)

Setup

  1. Populate the WEBAPP_URL environment variable with the URL of your cluster’s webapp-service endpoint:

    export WEBAPP_URL="http://$(kubectl get svc/myapp-webapp -o jsonpath={.status.loadBalancer.ingress[0].hostname})/"

Run

  1. Run the experiment:

    chaos run experiments/greeting-kill.json

Diagnosis

The output of the chaos run command shows that the experiment was run but there is a weakness in the system. When the myapp-greeting service is killed, the myapp-webapp endpoint returns a response took greater than 3 seconds allowed. This is out of bounds of the tolerance for the system to be observed as still in steady-state.

TODO: How do I start the service back during rollback?

Fix

How do we define a circuit-breaker?

Experiement: Kill a Dependent Service (Name)

Experiment: Send 10 concurrent requests with different Name

Experiment: Kill Nodes in one AZ

Experiment: Kill Nodes in two AZ

Experiment: Kill Masters in one AZ

Experiment: Kill All Masters

Cleanup

Delete the Helm chart:

helm delete --purge myapp

Istio

Install Istio

  1. Install Istio (complete details at Istio quick start):

    curl -L https://git.io/getLatestIstio | sh -
    cd istio-*
    export PATH=$PWD/bin:$PATH
    kubectl apply -f install/kubernetes/istio.yaml
  2. Inject Envoy proxy as sidecar in each pod:

    How to inject Istio for an application deployed using Helm chart?

Experiment: Kill a Dependent Service

Setup

Run

Diagnosis

Fix

Gremlin

Install Gremlin

Experiment: Kill a Dependent Service

Setup

Run

Diagnosis

Fix

About

Chaos Testing in Kubernetes

License:Apache License 2.0


Languages

Language:Smarty 100.0%