[bug] Failed to display node metrics

Question

[bug] Failed to display node metrics

fcecagno opened this issue a year ago · comments

Describe the bug

This is the way variables are configured on k8s-views-nodes.json:

...
node = label_values(kube_node_info, node)
instance = label_values(node_uname_info{nodename=~"(?i:($node))"}, instance)

In OKE, kube_node_info looks like this:

{__name__="kube_node_info", container="kube-state-metrics", container_runtime_version="cri-o://1.25.1-111.el7", endpoint="http", instance="10.244.0.40:8080", internal_ip="10.0.107.39", job="kube-state-metrics", kernel_version="5.4.17-2136.314.6.2.el7uek.x86_64", kubelet_version="v1.25.4", kubeproxy_version="v1.25.4", namespace="monitoring", node="10.0.107.39", os_image="Oracle Linux Server 7.9", pod="monitoring-kube-state-metrics-6fcd4d745c-txg2k", pod_cidr="10.244.1.0/25", provider_id="ocid1.instance.oc1.sa-saopaulo-1.xxx", service="monitoring-kube-state-metrics", system_uuid="d6462364-95bf-4122-a3ab-xxx"}

And node_uname_info looks like this:

node_uname_info{container="node-exporter", domainname="(none)", endpoint="http-metrics", instance="10.0.107.39:9100", job="node-exporter", machine="x86_64", namespace="monitoring", nodename="oke-cq2bxmvtqca-nsdfwre7l3a-seqv6owhq3a-0", pod="monitoring-prometheus-node-exporter-n6pzv", release="5.4.17-2136.314.6.2.el7uek.x86_64", service="monitoring-prometheus-node-exporter", sysname="Linux", version="#2 SMP Fri Dec 9 17:35:27 PST 2022"}

For this example, node=10.0.107.39, but when I query node_uname_info{nodename=~"(?i:($node))"}, it doesn't return anything, because nodename doesn't match the internal IP address of the node.
As a result, no node metrics is displayed.

How to reproduce?

No response

Expected behavior

No response

Additional context

Modifying the filter https://github.com/dotdc/grafana-dashboards-kubernetes/blob/master/dashboards/k8s-views-nodes.json#L3747-L3772 to use node_uname_info{instance="$node:9100"} fixes the issue.

David Calvert · Answer 1 · Mon Feb 20 2023 15:39:03 GMT+0800 (China Standard Time)

Hi @fcecagno, thank you for reporting this, will look at it this week!

Felipe Cecagno · Answer 2 · Wed Feb 22 2023 10:57:29 GMT+0800 (China Standard Time)

Hi @dotdc , I'm pretty sure I need to play with relabeling on node-exporter to make it work properly. I'll post here when I find the solution.

staniondaniel · Answer 3 · Thu Mar 09 2023 23:09:31 GMT+0800 (China Standard Time)

Hey
I am seeing the same problem. I think modifying the filter is the best solution. I other prometheus templates I see a filter like this being used:
label_values(node_uname_info{job="node-exporter", sysname!="Darwin"}, instance)

Modifying the labels may be a bit complicated for people using predefined monitoring solutions like kube-prometheus-stack

Regards

Felipe Cecagno · Answer 4 · Thu Mar 09 2023 23:41:48 GMT+0800 (China Standard Time)

Modifying the labels may be a bit complicated for people using predefined monitoring solutions like kube-prometheus-stack

That's precisely my case, and I didn't succeed modifying the labels, so I believe updating to a more generic filter might be a better solution.

David Calvert · Answer 5 · Fri Apr 26 2024 05:11:39 GMT+0800 (China Standard Time)

🎉 This issue has been resolved in version 1.1.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀