gardener / test-infra

Test machinery for orchestration of integration/e2e/smoke style tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use Telemetry Controller Under Test

vlerenc opened this issue · comments

What would you like to be added:
@dkistner has implemented a "telemetry controller" that keeps track of the control plane availability. It would make sense to have it observing the state of clusters under reconciliation/maintenance/test and report this metric to alert about poor shoot cluster control plane availability and eventually break the release/transport if KPIs are not met. Or shall this be part of the specific Gardener tests instead?

Why is this needed:
We sometimes miss issues here and lack repeating test results of this most important metric (it is the only metric relevant in our SLO).

@schrodit do you recall what was the issue with the telemetry controller when you tried it a long time ago?

If I remember correctly we tested it on dev for some time. also with persisted metrics in elastic search.
But there were 2 issues

  1. The metric was for a single test or a bunch of tests noch really usefull. This is because we only create a cluster test some stuff and then delete a cluster which did not give usefull metrics. We only had one useful test where the k8s version upgrade is tested but even there no one really had a look at the metrics.
  2. The more useful metric would be to have it running during a complete gardener update. But this was currently not possible with the current concourse + testmachinery implementation as we would need to start the testmachinery before the actual deployment started and end it after all shoots(or most of them) are reconciled.