cncf / tag-app-delivery

📨🚚CNCF App Delivery TAG

Home Page:https://cncf.io/projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

krkn sandbox submission review

joshgav opened this issue · comments

krkn is a chaos testing project proposed to CNCF sandbox in cncf/sandbox#44. This issue tracks discussions and reviews of krkn to help with accepting it in CNCF sandbox.

We've asked @psuriset and team to join an upcoming TAG meeting to discuss the following about the project:

  • the values this project proposes to end users
  • the project's high-level technical architecture
  • the project's near-term roadmap
  • state of the project's community and governance
  • comparison with existing projects

krkn will present to the TAG at our general meeting on 10/18. Agenda/notes here: https://docs.google.com/document/d/1OykvqvhSG4AxEdmDMXilrupsX2n1qCSJUWwTc3I7AOs/edit#heading=h.5676wfk2ybjv

Thank you @psuriset and team for presenting krkn to us last week. Following are notes from the presentation. The TAG believes krkn is a good fit for CNCF sandbox!

Value Props

  • find unexpected problems by injecting unexpected scenarios. Needed by Red Hat performance & scale team to ensure max performance of clusters and apps.
  • emphasis on performance - SLAs and SLOs
  • AI and recommender increase chaos coverage

Architecture

  • Components include krkn, cerberus, chaos recommender, chaos AI, & telemetry collector
  • client-side tool, doesn't run inside the cluster, don't want it to be a victim of its own actions
  • calls APIs to inject chaos
  • for supported scenarios has built-in checks for successful handling of failure
  • configure PromQL queries defining success
  • Cerberus: utility that aggregates health into a single go/no-go signal

Chaos AI and Recommender, Telemetry collector

  • Why? improve and increase coverage for chaos
  • Can watch telemetry from application or other components and create appropriate chaos test cases
  • developed by IBM
  • Recommender - based on static rules
  • Chaos Recommender already part of project, Chaos AI still in development but will be part of project
  • Chaos AI will include a mechanism to continually train a model based on actual telemetry and observation

Roadmap

  • implement chaos tests for more known scenarios, for example a Kafka cluster in K8s or DNS
  • want to learn from more users via CNCF
  • want to create visualizations and reports from tests

Community

  • Other contributors: IBM (AI)
  • Users: universities using and providing feedback, FSIs (banks, finance)

Questions

  • What do you mean by "focus on performance"?
    • use kube-burner
    • provide some recommended default SLOs to test against
  • Contrast with LitmusChaos and others
    • runs outside of cluster
    • cover more perf use cases
    • AI capability - automate creating test cases
  • How do you anticipate users using this? In a pipeline, ad-hoc?
    • Recommend using in a continuous chaos system
    • Use in a test environment first

Closing as this is now complete, thanks all.