litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

Home Page:https://litmuschaos.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(feat): Add chaos experiments to test the functionalities of coreDNS components

imrajdas opened this issue · comments

Request for the chaos experiments of coreDNS components. Here is the list of failure cases:

  • Pod Delete
    • Chaos Injection can be done by deleting coreDNS pod in kube-sytem namespace using the litmus or powerfulseal lib
  • Spike of HTTP requests
    • Chaos Injection:
      • Create an external pod ex- nginx,
      • Create a service by exposing nginx pod
      • Do an infinite HTTP request to the nginx service util the chaos duration.
  • ndots problem
  • Caching Failure
    • This can be done easily, by scaling the coreDNS deployment to "0" and then, scale it back to the desired number.
      kubectl scale deployment.apps/coredns -n kube-system --replicas=0
      kubectl scale deployment.apps/coredns -n kube-system --replicas=2
  • Delay in the HTTP requests
    • Chaos Injection can be done using Pumba network chaos lib

Common Checks:

  • By deploying an external application and service and check the liveness of the application by calling the service

Issue added to the 1.0 project board

@rajdas98 -- a one-liner against each of the chaos experiments would be very helpful!

This issue is a blanket feature request for different coreDNS scenarios/experiments, which will span across multiple milestones. The first step towards this is available as a coredns-pod-delete scenario available here: https://github.com/litmuschaos/litmus/tree/master/experiments/coredns/pod_delete

@rajdas98 which experiments are we planning to consider next?

@ksatchit We can pick the caching failure experiment.

coreDNS Pod delete experiment was added by this PR #1029

Litmus has enhanced its fault suite along with supported probe types, which allows users to test any application workload app-native constraints/health-validations, thereby removing the need for separate workload-specific experiments to a large extent.