kubernetes / kubernetes

Production-Grade Container Scheduling and Management

Home Page:https://kubernetes.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cpu Resources Container limits test is flaky for Windows on containerd 1.7

jiechen0826 opened this issue · comments

Which jobs are flaking?

https://storage.googleapis.com/k8s-triage/index.html?sig=windows&text=should%20not%20exceed%20limit%20by%20%3E%205%25

Which tests are flaking?

Kubernetes e2e suite: [It] [sig-windows] [Feature:Windows] Cpu Resources [Serial] Container limits should not be exceeded after waiting 2 minutes

Failed error:

[FAILED] Pod cpu-resources-test-windows-183/cpulimittest-091cc328-6d0d-4984-ac14-c6c63edea2dd reported usage is 0.525644525, but it should not exceed limit by > 5%

Since when has it been flaking?

Since migration to containerd 1.7.x

Testgrid link

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-1-28-windows-serial-slow/1782880261759832064

Reason for failure (if possible)

No response

Anything else we need to know?

The expected cpu usage is set to 0.5, but the actual cpu usage could be around 0.52-0.6, hence exceeding the limit more than the threshold of 5%.

Relevant SIG(s)

/sig windows

@jiechen0826: The label(s) sig/sig-windows cannot be applied, because the repository doesn't have them.

In response to this:

/sig sig-windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/sig windows

/triage accepted

I've reached out to some folks to see if the variance in the job limit is expected.

Jobs can potentially exceed the limit for a variety of reasons. If they do they get put into a deficit, so over time they would appear to be at the correct limit. Since we run for a short period of time we this is likely what we bumping into a scenario where the job happen to go over the limit.

I would be in favor of softening the 5% check in this case since it appears we are just slightly over it occasionally. We can add a comment in the code that mention that it will even out over time.