Chaos Testing
davidjumani opened this issue · comments
Introduce chaos testing as a way to test the stability and resiliency of Edge.
For example, Edge has an external dependency on the Kubernetes control plane. In larger environments, we need to be able to handle the "natural" chaos to the Kubernetes control plane (e.g. apiserver unavailability / load). Similarly, we may see periodic node pressure, Pod churn, etc. that cause our internal component (e.g. redis) Pods to be frequently recreated. In both of those scenarios, we need to be able to ensure our system can handle disruptions without affecting dataplane integrity, and risking costly outages for our customers.
This can follow the similar patter followed by platform
Testing the kube api server unavailability has been added in #9563