google / cadvisor

Analyzes resource usage and performance characteristics of running containers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disk usage metrics for containerd

ribbybibby opened this issue · comments

When switching from docker to containerd as my container runtime in Kubernetes, I noticed that container_fs_usage_bytes metrics were no longer being exported for my containers.

It looks like disk usage metrics aren't implemented for containerd, as noted by this comment: https://github.com/google/cadvisor/blob/v0.38.6/container/containerd/handler.go#L164-L165.

Disk usage is a pretty important metric to monitor, so I think, if possible, this should be added.

I have experienced the same issue.

It looks like disk usage metrics aren't implemented for containerd, as noted by this comment: https://github.com/google/cadvisor/blob/v0.38.6/container/containerd/handler.go#L164-L165.

There is some conversation about these metrics containerd/containerd#678. I suppose that contained provide this information.

commented

PR was submited . #2872

PR #2872 was closed, in favor of #2956, which was merged and subsequently reverted in #2964. The result is that these metrics are not available.

Is someone working on an alternative approach?

Is there any timeline for a fix of this issue?

+1 on looking for any update or timeline regarding this issue - these metrics are pretty important for observability and workload behavior.

Adding onto this ticket since we're blocked on switching to the containerd CRI without these metrics. We have alerting around ephemeral file system usage that would break if cAdvisor doesn't collect these from containerd.

@bobbypage Do you have an update on this? Best I can follow is that there is a possibly-working version in the containerd-cri branch after #2966 was merged. However, it might be incomplete based on #2936 (comment)?

Alternatively it seems like work has gone into not using cadvisor for container stats and k8s 1.23 has an alpha feature-gate which uses the cri stats provider (PodAndContainerStatsFromCRI). Is the plan to put momentum into that instead? If so, do you know when it would go beta?

Enabling the PodAndContainerStatsFromCRI feature-gate does not seem to work either; at least with containerd 1.6.4 the stats are still missing.

It appears that this won't be addressed any time soon, as KEP-2371 moves most of the stats collection out of cadvisor into the CRI interface. Is there an interim solution for users that need these stats?

The workaround for now is to use the containerd-cri branch (https://github.com/google/cadvisor/tree/containerd-cri) which has a special patch to export containerd disk metrics. The following image can be used: gcr.io/cadvisor/cadvisor:v0.45.0-containerd-cri which is built from that branch and contains the patch.

Is that branch being actively maintained? Do you know if it still works normally with other runtimes?

Is that branch being actively maintained?

Yes, it is maintained we just pushed the latest v0.45.0 changes to this branch. The reason we need this separate branch is because to get the Disk usage metrics on containerd requires importing the CRI API into cAdvisor. However, we can't import the CRI API into cAdvisor because cAdvisor is imported by k8s and k8s itself includes the CRI API which results in a circular dependency. So the workaround for now is to have this separate branch which includes CRI API. (see #2872 (comment) for that discussion).

Do you know if it still works normally with other runtimes?

Yes, it will work with other runtimes as well, but if containerd is not used there is no benefit of using it.

Ah hmm. So I take it the circular dependency prohibits this branch from being embedded in the kubelet, and there's no easy path towards doing so? Running a standalone deployment of cadvisor isn't particularly palatable, as asking our users to retool their monitoring stacks to make use of that would be a non-trivial amount of work. I'm honestly surprised that we got this far into the dockershim depreciation with cadvisor missing feature parity for one of the most popular replacement runtimes.

@brandond are you referring that most folks are using the existing /cadvisor/metrics endpoint on kubelet? If that's the case, then yes, unfortunately we aren't able to bring back this patch into kubelet due to circular dependency issue. The KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md aims to solve this issue long term.

Would it be possible to list a config somewhere that only gathers the disk? GKE controls our normal cadvisor so running a minimal "container disk metrics only" daemonset seems like a simple work around.

kubectl get --raw "/api/v1/nodes/(node-name)/proxy/stats/summary" (from kublet) gives infos for ephemeral-storage for each pod but sadly it's not available as a Prometheus metric...

Refreshing my memory on this issue, I realised we didn't link to the exporter @ribbybibby written to address this: https://github.com/utilitywarehouse/kube-summary-exporter. We have been running it for nearly 2 years now.

Hi @bobbypage/@team Is this comment still valid, so shall we expect the similar release tag for container 47.1 as well ? and for the rest of the releases untill this is fixed in enhancement/KEP. can you please confirm if you would recommend using this containerd-cri tags for the fix. seem like it is only workaround available.
Also could see the implementation. please correct me.

commented

Does containerd-cri work? I did a replace in my go mod and it still did not work. I see spec sets has file system to false. https://github.com/google/cadvisor/blob/containerd-cri/container/containerd/handler.go#L287-L289

Any updates on this ? containerd is the default and recommended runtime for GKE , but there is still no support for kubernetes_filesystem_usage ?

It appears at least on containerd://1.6.6 and the v0.45.0-containerd-cri tag, the `container_fs_* metrics are also just wrong.

container_fs_usage_bytes at least seems to be reporting the root device free space for every pod on the node as opposed to each containers/pods usage. Does anyone have a reference deployment manifest to use for containerd + that containerd-cri tag?

Any hope for this to be implemented soon ?