A lightweight kubernetes operator to test cluster and application resilience via chaos engineering 💣 ☸️
Khaos is a straightforward Kubernetes operator made with kubebuilder and designed for executing Chaos Engineering activities.
Through the implementation of custom controllers and resources, Khaos facilitates the configuration and automation
of operations such as the targeted deletion of pods within a specified namespace, the removal of nodes from the cluster, the deletion of secrets and more.
- Delete pods
- Delete cluster nodes
- Delete secrets
- Inject resource constraints in pods
- Add o remove labels in pods
- Exec commands inside pods (experimental).
First of all clone the repository:
git clone https://github.com/stackzoo/khaos && cd khaos
The repo contains a Makefile
with all that you need.
Inspect the make targets with the following command:
make help
Usage:
make <target>
General
help Display this help.
Development
manifests Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
generate Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
fmt Run go fmt against code.
vet Run go vet against code.
cluster-up Create a kind cluster named "test-operator-cluster" with a master and 3 worker nodes.
cluster-down Delete the kind cluster named "test-operator-cluster".
test Run tests.
lint Run golangci-lint linter & yamllint
lint-fix Run golangci-lint linter and perform fixes
Build
build Build manager binary.
run Run a controller from your host.
docker-build Build docker image with the manager.
docker-push Push docker image with the manager.
docker-buildx Build and push docker image for the manager for cross-platform support
Deployment
install Install CRDs into the K8s cluster specified in ~/.kube/config.
uninstall Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
deploy Deploy controller to the K8s cluster specified in ~/.kube/config.
undeploy Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
Build Dependencies
kustomize Download kustomize locally if necessary. If wrong version is installed, it will be removed before downloading.
controller-gen Download controller-gen locally if necessary. If wrong version is installed, it will be overwritten.
envtest Download envtest-setup locally if necessary.
You can spin up a local dev cluster with KinD via the following command:
make cluster-up
Install and list the operator CRDs with the following command:
make install && kubectl get crds
NAME CREATED AT
commandinjections.khaos.stackzoo.io 2023-11-28T12:55:25Z
containerresourcechaos.khaos.stackzoo.io 2023-11-28T12:55:25Z
nodedestroyers.khaos.stackzoo.io 2023-11-28T12:55:25Z
poddestroyers.khaos.stackzoo.io 2023-11-28T12:55:25Z
podlabelchaos.khaos.stackzoo.io 2023-11-28T12:55:25Z
secretdestroyers.khaos.stackzoo.io 2023-11-28T12:55:25Z
In order to run the operator on your cluster (current context - i.e. whatever cluster kubectl cluster-info
shows) run:
make run
In order to debug this project locally, I strongly suggest using vscode.
In vscode you need to create a .vscode/launch.json
file similar to the following:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Khaos Operator",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/main.go",
"args": []
}
]
}
In order to test the following examples, you can use the local KinD cluster (see the Local Testing and Debugging
section).
Once you have the cluster up and running, procede to create a new namespace called prod
and apply an example deployment:
kubectl create namespace prod && kubectl apply -f examples/test-deployment.yaml
Now you can procede with the examples!
DELETE PODS
Wait for all the pods in the prod
namespace to be up and running and then apply the PodDestroyer
manifest:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodDestroyer
metadata:
name: nginx-destroyer
spec:
selector:
matchLabels:
app: nginx
maxPods: 9
namespace: prod
kubectl apply -f examples/pod-destroyer.yaml
Now you can observe 2 things:
- the pods in prod namespace are being Terminated (and recreated by the replicaset):
NAME READY STATUS RESTARTS AGE
nginx-deployment-7bf8c77b5b-5fvrc 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-5qcx4 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-6kmbd 0/1 ContainerCreating 0 6s
nginx-deployment-7bf8c77b5b-75bg6 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-bcbk5 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-f5wkh 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-gfdzl 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-gmhr2 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-gsprh 1/1 Terminating 0 6s
nginx-deployment-7bf8c77b5b-hvsff 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-v4j9v 0/1 ContainerCreating 0 6s
nginx-deployment-7bf8c77b5b-zxxv7 0/1 Terminating 0 6s
nginx-deployment-7bf8c77b5b-6kmbd 1/1 Running 0 6s
nginx-deployment-7bf8c77b5b-zxxv7 0/1 Terminating 0 6s
nginx-deployment-7bf8c77b5b-zxxv7 0/1 Terminating 0 6s
nginx-deployment-7bf8c77b5b-zxxv7 0/1 Terminating 0 6s
nginx-deployment-7bf8c77b5b-v4j9v 1/1 Running 0 7s
nginx-deployment-7bf8c77b5b-gsprh 0/1 Terminating 0 32s
nginx-deployment-7bf8c77b5b-gsprh 0/1 Terminating 0 33s
nginx-deployment-7bf8c77b5b-gsprh 0/1 Terminating 0 33s
nginx-deployment-7bf8c77b5b-gsprh 0/1 Terminating 0 33s
- Our operator shows the reconciliation logic's logs:
2023-11-28T14:07:18+01:00 INFO Reconciling PodDestroyer: default/nginx-destroyer {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00 INFO Selector: {map[app:nginx] []} {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00 INFO MaxPods: 9 {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
2023-11-28T14:07:18+01:00 INFO Namespace: prod {"controller": "poddestroyer", "controllerGroup": "khaos.stackzoo.io", "controllerKind": "PodDestroyer", "PodDestroyer": {"name":"nginx-destroyer","namespace":"default"}, "namespace": "default", "name": "nginx-destroyer", "reconcileID": "1e16a7d2-825a-4b46-b4e5-ac1228bc1c36"}
Now we can inspect the status of our PodDestroyer custom resource:
kubectl get poddestroyer
NAME AGE
nginx-destroyer 4m51s
kubectl get poddestroyer nginx-destroyer -o yaml
This will retrieve our resource in yaml
format:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodDestroyer
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"khaos.stackzoo.io/v1alpha1","kind":"PodDestroyer","metadata":{"annotations":{},"name":"nginx-destroyer","namespace":"default"},"spec":{"maxPods":9,"namespace":"prod","selector":{"matchLabels":{"app":"nginx"}}}}
creationTimestamp: "2023-11-28T13:07:18Z"
generation: 1
name: nginx-destroyer
namespace: default
resourceVersion: "2009"
uid: fbba6287-6f70-406b-821e-9000f097afc5
spec:
maxPods: 9
namespace: prod
selector:
matchLabels:
app: nginx
status:
numPodsDestroyed: 9
The status
spec tells you how many pods have been successfully destroyed.
DELETE NODES
First, retrieve nodes info for your cluster:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
test-operator-cluster-control-plane Ready control-plane 24m v1.27.3
test-operator-cluster-worker Ready <none> 24m v1.27.3
test-operator-cluster-worker2 Ready <none> 24m v1.27.3
test-operator-cluster-worker3 Ready <none> 24m v1.27.3
Now apply the following NodeDestroyer
manifest:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: NodeDestroyer
metadata:
name: example-node-destroyer
spec:
nodeNames:
- test-operator-cluster-worker
- test-operator-cluster-worker3
kubectl apply -f examples/node-destroyer.yaml
Now, once again, retrieve the node list from the kuber-apiserver:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
test-operator-cluster-control-plane Ready control-plane 25m v1.27.3
test-operator-cluster-worker2 Ready <none> 25m v1.27.3
As you can see the operator succesfully removed the specified nodes.
DELETE SECRETS
First create a new kubernetes secret (empty secret is fine):
kubectl -n prod create secret generic test-secret
secret/test-secret created
Now apply the following SecretDestroyer
manifest:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: SecretDestroyer
metadata:
name: example-secret-destroyer
spec:
namespace: prod
secretNames:
- test-secret
kubectl apply -f examples/secret-destroyer.yaml
Try to list all the secrets in the prod
namespace:
kubectl -n prod get secrets
No resources found in prod namespace.
The specified secret was successfully removed.
APPLY NEW CONTAINER RESOURCE LIMITS
Apply the following ContainerResourceChaos
manifest:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: ContainerResourceChaos
metadata:
name: example-container-resource-chaos
namespace: prod
spec:
namespace: prod
DeploymentName: nginx-deployment
containerName: nginx
maxCPU: "666m"
maxRAM: "512Mi"
kubectl apply -f examples/container-resource-chaos.yaml
Now retrieve one of the pod in the prod namespace in yaml
format and take a look at the resources:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-11-28T13:43:37Z"
generateName: nginx-deployment-c54b8b4b4-
labels:
app: nginx
pod-template-hash: c54b8b4b4
name: nginx-deployment-c54b8b4b4-jvw4k
namespace: prod
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: nginx-deployment-c54b8b4b4
uid: a73e8483-a51b-4f43-806d-38b8976ee61d
resourceVersion: "6128"
uid: 6be9fe17-f6b8-418b-96a1-bdf70da8eb95
spec:
containers:
- image: nginx:latest
imagePullPolicy: Always
name: nginx
resources: # modified
limits:
cpu: 666m
memory: 512Mi
requests:
cpu: 666m
memory: 512Mi
MODIFY POD LABELS
Apply the following PodLabelChaos
manifest:
apiVersion: khaos.stackzoo.io/v1alpha1
kind: PodLabelChaos
metadata:
name: podlabelchaos-test
spec:
deploymentName: nginx-deployment
namespace: prod
labels:
chaos: "true"
addLabels: true
kubectl apply -f examples/pod-label-chaos.yaml
Now retrieve one of the pod in the prod namespace in yaml
format and take a look at the labels:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-11-28T15:27:22Z"
generateName: nginx-deployment-6bb89bf6cd-
labels:
app: nginx
chaos: "true"
pod-template-hash: 6bb89bf6cd
name: nginx-deployment-6bb89bf6cd-52j42
namespace: prod
This repo contains a github action that publish the operator oci image to github registry when new releases tag are pushed to the main branch.
In order to install the operator as a pod in the cluster you can leverage one of the make targets:
make deploy IMG=ghcr.io/stackzoo/khaos:0.0.4
This command will install all the required CRDs and RBAC manifests and then start the operator as a pod:
kubectl get pods -n khaos-system
NAME READY STATUS RESTARTS AGE
khaos-controller-manager-8887957bf-5b8g9 1/1 Running 0 107s
Note
If you encounter RBAC errors, you may need to grant yourself cluster-admin privileges or be logged in as admin.