[Flaky Test] HA Multi zones tests timeout frequently

Question

[Flaky Test] HA Multi zones tests timeout frequently

MichaelEischer opened this issue 4 months ago · comments

How to categorize this issue?

/area testing
/kind flake

Which test(s)/suite(s) are flaking:

The tests run by the ProwJob pull-gardener-e2e-kind-ha-multi-zone. More specifically [It] Shoot Tests Shoot with workers Create, Update, Delete [Shoot, default, basic, simple].

CI link:

https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/9449/pull-gardener-e2e-kind-ha-multi-zone/1779763882739372032
https://testgrid.k8s.io/gardener-gardener#ci-gardener-e2e-kind-ha-multi-zone , for example https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-ha-multi-zone/1779501481980858368

Reason for failure:

Apparently there are too many machines: {"level":"info","ts":"2024-04-14T18:03:00.661Z","logger":"shoot-test.test","msg":"Shoot is not yet reconciled","shoot":{"name":"e2e-default","namespace":"garden-local"},"reason":"condition type EveryNodeReady is not true yet, had message too many worker nodes are registered. Exceeding maximum desired machine count (4/3) with reason NodesScalingDown"}

I've noticed the flaky test as part of #9449, but the test runs on testgrid also contain the exact same error message.

Gardener Prow Robot · Answer 1 · Sun Jul 14 2024 17:59:10 GMT+0800 (China Standard Time)

The Gardener project currently lacks enough active contributors to adequately respond to all issues.
This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Mark this issue as rotten with /lifecycle rotten
Close this issue with /close

/lifecycle stale