mohammedkassem / khaos

A lightweight kubernetes operator to test cluster resilience via chaos engineering 💣 ☸️

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KHAOS

releaser License Latest Release GitHub last commit (branch)


logo


A lightweight kubernetes operator to test cluster and application resilience via chaos engineering 💣 ☸️

Abstract

Khaos is a straightforward Kubernetes operator made with kubebuilder and designed for executing Chaos Engineering activities.
Through the implementation of custom controllers and resources, Khaos facilitates the configuration and automation
of operations such as the targeted deletion of pods within a specified namespace, the removal of nodes from the cluster, the deletion of secrets and more.

Supported features

  • Delete pods
  • Delete cluster nodes
  • Delete secrets
  • Inject resource constraints in pods
  • Add o remove labels in pods
  • Exec commands inside pods (experimental).

Local Testing and Debugging

First of all clone the repository:

git clone https://github.com/stackzoo/khaos && cd khaos

The repo contains a Makefile with all that you need.
Inspect the make targets with the following command:

make help

Usage:
  make <target>

General
  help             Display this help.

Development
  manifests        Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
  generate         Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
  fmt              Run go fmt against code.
  vet              Run go vet against code.
  cluster-up       Create a kind cluster named "test-operator-cluster" with a master and 3 worker nodes.
  cluster-down     Delete the kind cluster named "test-operator-cluster".
  test             Run tests.
  lint             Run golangci-lint linter & yamllint
  lint-fix         Run golangci-lint linter and perform fixes

Build
  build            Build manager binary.
  run              Run a controller from your host.
  docker-build     Build docker image with the manager.
  docker-push      Push docker image with the manager.
  docker-buildx    Build and push docker image for the manager for cross-platform support

Deployment
  install          Install CRDs into the K8s cluster specified in ~/.kube/config.
  uninstall        Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
  deploy           Deploy controller to the K8s cluster specified in ~/.kube/config.
  undeploy         Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.

Build Dependencies
  kustomize        Download kustomize locally if necessary. If wrong version is installed, it will be removed before downloading.
  controller-gen   Download controller-gen locally if necessary. If wrong version is installed, it will be overwritten.
  envtest          Download envtest-setup locally if necessary.

You can spin up a local dev cluster with KinD via the following command:

make cluster-up

Install and list the operator CRDs with the following command:

make install && kubectl get crds

NAME                                       CREATED AT
commandinjections.khaos.stackzoo.io        2023-11-28T12:55:25Z
containerresourcechaos.khaos.stackzoo.io   2023-11-28T12:55:25Z
nodedestroyers.khaos.stackzoo.io           2023-11-28T12:55:25Z
poddestroyers.khaos.stackzoo.io            2023-11-28T12:55:25Z
podlabelchaos.khaos.stackzoo.io            2023-11-28T12:55:25Z
secretdestroyers.khaos.stackzoo.io         2023-11-28T12:55:25Z

In order to run the operator on your cluster (current context - i.e. whatever cluster kubectl cluster-info shows) run:

make run

In order to debug this project locally, I strongly suggest using vscode.

In vscode you need to create a .vscode/launch.json file similar to the following:

{
    "version": "0.2.0",
    "configurations": [
      {
        "name": "Debug Khaos Operator",
        "type": "go",
        "request": "launch",
        "mode": "auto",
        "program": "${workspaceFolder}/cmd/main.go",
        "args": []
      }
    ]
  }

Examples

In order to test the following examples, you can use the local KinD cluster (see the Local Testing and Debugging section).
Once you have the cluster up and running, procede to create a new namespace called prod and apply an example deployment:

kubectl create namespace prod && kubectl apply -f examples/test-deployment.yaml

Now you can procede with the examples!

DELETE PODS

Wait for all the pods in the prod namespace to be up and running and then apply the PodDestroyer manifest:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodDestroyer
metadata:
  name: nginx-destroyer
spec:
  selector:
    matchLabels:
      app: nginx
  maxPods: 9
  namespace: prod
kubectl apply -f examples/pod-destroyer.yaml

Now you can observe 2 things:

  1. the pods in prod namespace are being Terminated (and recreated by the replicaset):
NAME                                READY   STATUS              RESTARTS   AGE
nginx-deployment-7bf8c77b5b-5fvrc   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-5qcx4   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-6kmbd   0/1     ContainerCreating   0          6s
nginx-deployment-7bf8c77b5b-75bg6   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-bcbk5   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-f5wkh   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-gfdzl   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-gmhr2   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-gsprh   1/1     Terminating         0          6s
nginx-deployment-7bf8c77b5b-hvsff   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-v4j9v   0/1     ContainerCreating   0          6s
nginx-deployment-7bf8c77b5b-zxxv7   0/1     Terminating         0          6s
nginx-deployment-7bf8c77b5b-6kmbd   1/1     Running             0          6s
nginx-deployment-7bf8c77b5b-zxxv7   0/1     Terminating         0          6s
nginx-deployment-7bf8c77b5b-zxxv7   0/1     Terminating         0          6s
nginx-deployment-7bf8c77b5b-zxxv7   0/1     Terminating         0          6s
nginx-deployment-7bf8c77b5b-v4j9v   1/1     Running             0          7s
nginx-deployment-7bf8c77b5b-gsprh   0/1     Terminating         0          32s
nginx-deployment-7bf8c77b5b-gsprh   0/1     Terminating         0          33s
nginx-deployment-7bf8c77b5b-gsprh   0/1     Terminating         0          33s
nginx-deployment-7bf8c77b5b-gsprh   0/1     Terminating         0          33s
  1. Our operator shows the reconciliation logic's logs:
2023-11-28T14:07:18+01:00       INFO    Reconciling PodDestroyer: default/nginx-destroyer       {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00       INFO    Selector: {map[app:nginx] []}   {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00       INFO    MaxPods: 9      {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00       INFO    Namespace: prod {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}

Now we can inspect the status of our PodDestroyer custom resource:

kubectl get poddestroyer

NAME              AGE
nginx-destroyer   4m51s
kubectl get poddestroyer nginx-destroyer -o yaml

This will retrieve our resource in yaml format:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodDestroyer
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"khaos.stackzoo.io/v1alpha1","kind":"PodDestroyer","metadata":{"annotations":{},"name":"nginx-destroyer","namespace":"default"},"spec":{"maxPods":9,"namespace":"prod","selector":{"matchLabels":{"app":"nginx"}}}}
  creationTimestamp: "2023-11-28T13:07:18Z"
  generation: 1
  name: nginx-destroyer
  namespace: default
  resourceVersion: "2009"
  uid: fbba6287-6f70-406b-821e-9000f097afc5
spec:
  maxPods: 9
  namespace: prod
  selector:
    matchLabels:
      app: nginx
status:
  numPodsDestroyed: 9

The status spec tells you how many pods have been successfully destroyed.

DELETE NODES

First, retrieve nodes info for your cluster:

kubectl get nodes

NAME                                  STATUS   ROLES           AGE   VERSION
test-operator-cluster-control-plane   Ready    control-plane   24m   v1.27.3
test-operator-cluster-worker          Ready    <none>          24m   v1.27.3
test-operator-cluster-worker2         Ready    <none>          24m   v1.27.3
test-operator-cluster-worker3         Ready    <none>          24m   v1.27.3

Now apply the following NodeDestroyer manifest:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: NodeDestroyer
metadata:
  name: example-node-destroyer
spec:
  nodeNames:
    - test-operator-cluster-worker
    - test-operator-cluster-worker3
kubectl apply -f examples/node-destroyer.yaml

Now, once again, retrieve the node list from the kuber-apiserver:

kubectl get nodes

NAME                                  STATUS   ROLES           AGE   VERSION
test-operator-cluster-control-plane   Ready    control-plane   25m   v1.27.3
test-operator-cluster-worker2         Ready    <none>          25m   v1.27.3

As you can see the operator succesfully removed the specified nodes.

DELETE SECRETS

First create a new kubernetes secret (empty secret is fine):

kubectl -n prod create secret generic test-secret

secret/test-secret created

Now apply the following SecretDestroyer manifest:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: SecretDestroyer
metadata:
  name: example-secret-destroyer
spec:
  namespace: prod
  secretNames:
    - test-secret
kubectl apply -f examples/secret-destroyer.yaml

Try to list all the secrets in the prod namespace:

kubectl -n prod get secrets

No resources found in prod namespace.

The specified secret was successfully removed.

APPLY NEW CONTAINER RESOURCE LIMITS

Apply the following ContainerResourceChaos manifest:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: ContainerResourceChaos
metadata:
  name: example-container-resource-chaos
  namespace: prod
spec:
  namespace: prod
  DeploymentName: nginx-deployment
  containerName: nginx
  maxCPU: "666m"
  maxRAM: "512Mi"
kubectl apply -f examples/container-resource-chaos.yaml

Now retrieve one of the pod in the prod namespace in yaml format and take a look at the resources:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-11-28T13:43:37Z"
  generateName: nginx-deployment-c54b8b4b4-
  labels:
    app: nginx
    pod-template-hash: c54b8b4b4
  name: nginx-deployment-c54b8b4b4-jvw4k
  namespace: prod
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-deployment-c54b8b4b4
    uid: a73e8483-a51b-4f43-806d-38b8976ee61d
  resourceVersion: "6128"
  uid: 6be9fe17-f6b8-418b-96a1-bdf70da8eb95
spec:
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    resources: # modified
      limits:
        cpu: 666m
        memory: 512Mi
      requests:
        cpu: 666m
        memory: 512Mi
MODIFY POD LABELS

Apply the following PodLabelChaos manifest:

apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodLabelChaos
metadata:
  name: podlabelchaos-test
spec:
  deploymentName: nginx-deployment
  namespace: prod
  labels:
    chaos: "true"
  addLabels: true
kubectl apply -f examples/pod-label-chaos.yaml

Now retrieve one of the pod in the prod namespace in yaml format and take a look at the labels:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-11-28T15:27:22Z"
  generateName: nginx-deployment-6bb89bf6cd-
  labels:
    app: nginx
    chaos: "true"
    pod-template-hash: 6bb89bf6cd
  name: nginx-deployment-6bb89bf6cd-52j42
  namespace: prod

Operator Installation

This repo contains a github action that publish the operator oci image to github registry when new releases tag are pushed to the main branch.
In order to install the operator as a pod in the cluster you can leverage one of the make targets:

make deploy IMG=ghcr.io/stackzoo/khaos:0.0.4

This command will install all the required CRDs and RBAC manifests and then start the operator as a pod:

kubectl get pods -n khaos-system

NAME                                       READY   STATUS             RESTARTS   AGE
khaos-controller-manager-8887957bf-5b8g9   1/1     Running               0       107s

Note

If you encounter RBAC errors, you may need to grant yourself cluster-admin privileges or be logged in as admin.

Useful References

About

A lightweight kubernetes operator to test cluster resilience via chaos engineering 💣 ☸️

License:Apache License 2.0


Languages

Language:Go 82.0%Language:Makefile 15.6%Language:Dockerfile 2.4%