krkn-chaos / krkn

Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hook to Cerberus to understand the recovery time

chaitanyaenr opened this issue · comments

When running chaos scenarios, it's important to understand how long it takes for the cluster as a whole to be healthy post failure injection in order to find the areas to improve. Today, Kraken pings cerberus to pass/fail but doesn't track the duration for recovery - we are tracking the cerberus metrics manually to understand it.

It would be nice to have Kraken query cerberus post chaos scenario with a timeout and dump a json with the timing.

For example: track the time taken by the cluster to recover post zone outage.