google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing cAdvisor network metrics when using gVisor

karanthukral opened this issue · comments

Description

After upgrading gVisor from release 20210322.0 to 20210720, we noticed that we were missing some cAdvisor (ex: container_network_receive_packets_total, container_network_receive_bytes_total etc) prometheus container metrics for all pods/containers running on gVisor.

During our debugging we tried to nail down any gVisor changes that could have caused this issue. We did this by going through the gVisor changelog while also running multiple different versions of gVisor to find the release where this stopped working. We found that release 20210518.0 was when metrics stopped being collected. 20210510.0 release was the last release we were able to successufully receieve metrics from cAdvisor. Looking at the changelog between the versions, we came across the following commit where the sandbox was updated to use the pod cgroup instead of the (1st) container cgroup it was using. Given that cAdvisor is built to deliver per container metrics but the mentioned commit moves the sandbox (and its metrics) to the pod's cgroup, cAdvisor stops emitting metrics for all gVisor containers.

Comparing the sandbox config for my pod between the working version of gVisor and broken version, the only changes that stood out were the cgroup changes. Examples of the sandbox config:

working

"sandbox": {
    "id": "e7b82bbf3ee2901aaec811ec59d807edaf6c6967884103ac418cfa3b032481da",
    "pid": 342475,
    "cgroup": {
      "name": "/kubepods/burstable/pod308d8927-7e25-443c-8dad-d390f7023b0e/e7b82bbf3ee2901aaec811ec59d807edaf6c6967884103ac418cfa3b032481da",
      "parents": null,
      "own": {
        "blkio": true,
        "cpu": true,
        "cpuset": true,
        "devices": true,
        "freezer": true,
        "hugetlb": true,
        "memory": true,
        "net_prio": true,
        "perf_event": true,
        "pids": true,
        "rdma": true,
        "systemd": true
      }
    },
    "originalOomScoreAdj": -999
}

not working

"sandbox": {
    "id": "71e675350a4c07b79cae482e650c1beeb1e23488112d3b751838a9dc8e17399a",
    "pid": 359425,
    "cgroup": {
      "name": "/kubepods/burstable/pod2baf1dc0-3715-4153-a8c2-b46225e219cc",
      "parents": null,
      "own": {
        "devices": true,
        "rdma": true
      }
    },
    "originalOomScoreAdj": -999
}

Steps to reproduce

In order to reproduce the bugs, you need a running k8s cluster which has cAdvisor metrics enabled as part of the kubelet. We rely on using Prometheus to scrape the metrics but in order to simply test this issue one can run a curl command against the kubelet on the node on which the gVisor pod is running. These steps assume you are able to authenticate requests against the kubelet on the node.

  1. Run 20210518.0 or newer release of gVisor containerd shim and runsc
  2. Deploy a pod using gVisor on k8s that is able to recieve and/or respond to network calls. I was utilizing sample-golang-notes
  3. Run GET requests against your running app
  4. Run curl -H 'authorization: Bearer $TOKEN' -k https://KUBELET_IP/metrics/cadvisor and search/grep for container_network_receive_bytes_total or container_network_receive_packets_total metrics which are associated with your namespace.

You should see the mentioned metrics missing from the reponse to request. If you downgrade your gVisor release to 20210510.0 or below and follow the same steps, you should start to see cAdvisor metrics mentioned come through again.

runsc version

> runsc --version
runsc version release-20210518.0
spec: 1.0.2

docker version (if using docker)

No response

uname

Linux NODE 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux

kubectl (if using Kubernetes)

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Thanks for the detailed report and diagnosis!

cAdvisor uses libcontainer to get network stats. libcontainer queries sysfs for the virtual device stats directly, so I don't see how the change in cgroup assignment could affect that.

Here is where libcontainer queries sysfs. Just to double check, these values do get updated for gVisor:

$ cat /sys/class/net/vethf9a693cd/statistics/tx_bytes 
7174013552
$ cat /sys/class/net/vethf9a693cd/statistics/rx_bytes 
16450025701

I'll take a closer look into how cAdvisor/libcontainer finds the interface name and container information to see if it's getting somehow confused with the change in cgroup assignment.

cAdvisor uses an inotify watcher under /sys/fs/cgroup to detect when containers are created and deleted. Because runsc creates cgroups for the sandbox only (containers are inside the sandbox and not visible to the host), cAdvisor doesn't discover these containers and doesn't report about them. By simply creating these directories, cAdvisor start reporting correct metrics for them.

Here is an example with a fix that I will submit shortly:

> crictl pods | grep play
09f32de61c545       46 minutes ago      Ready               play                                          default             0                   gvisor

> kubectl get --raw "/api/v1/nodes/gke-test121-gvisor-aac9211d-fy44/proxy/metrics/cadvisor" | grep 09f32de61c545 | head
container_cpu_load_average_10s{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_cpu_system_seconds_total{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_cpu_user_seconds_total{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_file_descriptors{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_last_seen{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 1.632788426e+09 1632788426008
container_memory_cache{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_memory_failcnt{container="",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play"} 0 1632788410403
container_memory_failures_total{container="",failure_type="pgfault",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play",scope="container"} 0 1632788410403
container_memory_failures_total{container="",failure_type="pgfault",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play",scope="hierarchy"} 0 1632788410403
container_memory_failures_total{container="",failure_type="pgmajfault",id="/kubepods/burstable/pod0bf0aa65-592e-44f2-a118-6c85d48951e3/09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",image="k8s.gcr.io/pause:3.2",name="09f32de61c54566c0ddba0e84883e160ee3c41bc33da63ddd2cba78e32d785d7",namespace="default",pod="play",scope="container"} 0 1632788410403

cAdvisor uses an inotify watcher under /sys/fs/cgroup to detect when containers are created and deleted. Because runsc creates cgroups for the sandbox only (containers are inside the sandbox and not visible to the host), cAdvisor doesn't discover these containers and doesn't report about them. By simply creating these directories, cAdvisor start reporting correct metrics for them.

Great find and thanks for taking the time to explain! 👏