kubernetes / kubernetes

Production-Grade Container Scheduling and Management

Home Page:https://kubernetes.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CPUStats.UsageCoreNanoSeconds is inaccurate on windows - 1/10 of the real value.

qiutongs opened this issue · comments

What happened?

The CPU utilization metrics from Kubelet summary API CPUStats.UsageCoreNanoSeconds is not accurate on Windows. The value is only 1/10 of the real value which I have other VM level metrics as a proof.

summary API CPU utilization metric

image

VM CPU utilization metric

image

What did you expect to happen?

The CPUStats.UsageCoreNanoSeconds should be accurate on Windows too.

How can we reproduce it (as minimally and precisely as possible)?

$ kubectl get --raw "/api/v1/nodes/<NODE>/proxy/stats/summary"

T1

  "cpu": {
   "time": "2023-12-06T21:23:48Z",
   "usageNanoCores": 8000000,
   "usageCoreNanoSeconds": 20985800000000
  },

T2

  "cpu": {
   "time": "2023-12-06T21:24:38Z",
   "usageNanoCores": 8000000,
   "usageCoreNanoSeconds": 20986320000000
  },

(20986320000000 - 20985800000000)ns / 50s = 0.01 s/s
This matches the magnitude of the metric chart.

Anything else we need to know?

This problem doesn't exist on Linux.

Call stack

  • metric is set here by kubelet -
    func (c *StatsClient) createRootContainerInfo() (*cadvisorapiv2.ContainerInfo, error) {
    nodeMetrics, err := c.client.getNodeMetrics()
    if err != nil {
    return nil, err
    }
    var stats []*cadvisorapiv2.ContainerStats
    stats = append(stats, &cadvisorapiv2.ContainerStats{
    Timestamp: nodeMetrics.timeStamp,
    Cpu: &cadvisorapi.CpuStats{
    Usage: cadvisorapi.CpuUsage{
    Total: nodeMetrics.cpuUsageCoreNanoSeconds,
  • the data collected by using windows perf counters -
    cpuValue, err := cpuCounter.getData()
    cpuCores := ProcessorCount()
    if err != nil {
    klog.ErrorS(err, "Unable to get cpu perf counter data")
    return
    }
  • query here -
    cpuQuery = "\\Processor(_Total)\\% Processor Time"

Kubernetes version

$ kubectl version
1.27

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Could usageNanoCores be wrong?

This code caught my attention:

cachePeriodSeconds := uint64(defaultCachePeriod / time.Second)
perfCounterUpdatePeriodSeconds := uint64(perfCounterUpdatePeriod / time.Second)
cpuUsageNanoCores := ((p.cpuUsageCoreNanoSecondsCache.latestValue - p.cpuUsageCoreNanoSecondsCache.previousValue) * perfCounterUpdatePeriodSeconds) / cachePeriodSeconds

usageNanoCores is computed in cri_stats_provider.go, like this:

nanoSeconds := newStats.Timestamp - cachedStats.Timestamp
if nanoSeconds <= 0 {
return nil, fmt.Errorf("zero or negative interval (%v - %v)", newStats.Timestamp, cachedStats.Timestamp)
}
usageNanoCores := uint64(float64(newStats.UsageCoreNanoSeconds.Value-cachedStats.UsageCoreNanoSeconds.Value) /
float64(nanoSeconds) * float64(time.Second/time.Nanosecond))

So wouldn't something like this be equivalent (just going by looking at the code)?

cpuUsageNanoCores := (p.cpuUsageCoreNanoSecondsCache.latestValue - p.cpuUsageCoreNanoSecondsCache.previousValue) / float64(perfCounterUpdatePeriod) * float64(time.Second/time.Nanosecond)