prometheus / node_exporter

Exporter for machine metrics

Home Page:https://prometheus.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panic: runtime error: slice bounds out of range [11:10]

jicki opened this issue · comments

Host operating system: output of uname -a

uname -a
Linux 10-1-1-31.orin.pd.sz 5.10.104-tegra #1 SMP PREEMPT Tue Jan 24 15:09:44 PST 2023 aarch64 aarch64 aarch64 GNU/Linux

node_exporter version: output of node_exporter --version

Starting node_exporter" version="(version=1.6.1, branch=HEAD, revision=4a1b77600c1873a8233f3ffb55afcedbb63b8d84)

node_exporter command line flags

node_exporter log output

ts=2023-10-16T02:56:12.623Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.6.1, branch=HEAD, revision=4a1b77600c1873a8233f3ffb55afcedbb63b8d84)"
ts=2023-10-16T02:56:12.624Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.20.6, platform=linux/arm64, user=root@586879db11e5, date=20230717-12:11:23, tags=netgo osusergo static_build)"
ts=2023-10-16T02:56:12.624Z caller=node_exporter.go:183 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."
ts=2023-10-16T02:56:12.626Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2023-10-16T02:56:12.627Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2023-10-16T02:56:12.627Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:110 level=info msg="Enabled collectors"
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=arp
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=bcache
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=bonding
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=btrfs
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=cgroups
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=conntrack
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=cpu
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=cpufreq
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=diskstats
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=dmi
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=drbd
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=edac
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=entropy
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=ethtool
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=fibrechannel
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=filefd
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=filesystem
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=hwmon
ts=2023-10-16T02:56:12.627Z caller=node_exporter.go:117 level=info collector=infiniband
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=ipvs
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=loadavg
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=mdadm
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=meminfo
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=mountstats
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=netclass
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=netdev
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=netstat
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=nfs
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=nfsd
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=nvme
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=os
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=powersupplyclass
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=pressure
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=processes
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=rapl
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=schedstat
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=selinux
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=sockstat
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=softnet
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=stat
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=sysctl
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=tapestats
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=textfile
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=thermal_zone
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=time
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=timex
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=udp_queues
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=uname
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=vmstat
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=xfs
ts=2023-10-16T02:56:12.628Z caller=node_exporter.go:117 level=info collector=zfs
ts=2023-10-16T02:56:12.629Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9100
ts=2023-10-16T02:56:12.629Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9100
panic: runtime error: slice bounds out of range [11:10]

goroutine 75 [running]:
github.com/prometheus/procfs/sysfs.filterOfflineCPUs(0x40001ed400?, 0x4000109bf0)
        /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:181 +0x214
github.com/prometheus/procfs/sysfs.FS.SystemCpufreq({{0xffffee403363?, 0x9?}})
        /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:209 +0x1c8
github.com/prometheus/node_exporter/collector.(*cpuFreqCollector).Update(0x0?, 0x0?)
        /app/collector/cpufreq_linux.go:51 +0x38
github.com/prometheus/node_exporter/collector.execute({0x72858b, 0x7}, {0x83f4a8, 0x4000101de0}, 0x0?, {0x83efc8, 0x400003f9c0})
        /app/collector/collector.go:161 +0x60
github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0x72858b?, 0x0?}, {0x83f4a8?, 0x4000101de0?})
        /app/collector/collector.go:152 +0x3c
created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect
        /app/collector/collector.go:151 +0x98


Are you running node_exporter in Docker?

running node_exporter to  k8s daemonset 

Could you please look at the /sys/devices/system/cpu directory and /sys/devices/system/cpu/offline files of the node where node_exporter is panic?

This is a bug in the github.com/prometheus/procfs package in v0.10.0
https://github.com/prometheus/procfs/blob/dd377c72009a3d077169f6f48c4027713ceeff5e/sysfs/system_cpu.go#L173-L186

func filterOfflineCPUs(offlineCpus *[]uint16, cpus *[]string) error {
        for i, cpu := range *cpus {
                cpuName := strings.TrimPrefix(filepath.Base(cpu), "cpu")
                cpuNameUint16, err := strconv.Atoi(cpuName)
                if err != nil {
                        return err
                }
                if binSearch(uint16(cpuNameUint16), offlineCpus) {
                        *cpus = append((*cpus)[:i], (*cpus)[i+1:]...)
                }
        }


        return nil
}

If there are many offline cpus, *cpus = append((*cpus)[:i], (*cpus)[i+1:]...)The length of the cpus will change every time, causing the subsequent ones to go out of bounds , this bug has been fixed in later versions of github.com/prometheus/procfs, you can upgrade the version of node_exporter

Will there be an update to the prom/node-exporter docker container to resolve this?

When is expected new release, rebuild fixing this issue?

I am running into the same issue with the relase v1.6.1 on a Jetson Orin device (use Prometheus-kube-stack), when the updated version will be available?

goroutine 73 [running]: github.com/prometheus/procfs/sysfs.filterOfflineCPUs(0x40002c2e00?, 0x4000105bf0) /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:181 +0x214 github.com/prometheus/procfs/sysfs.FS.SystemCpufreq({{0xfffff3572dd5?, 0x9?}}) /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:209 +0x1c8 github.com/prometheus/node_exporter/collector.(*cpuFreqCollector).Update(0x0?, 0x0?) /app/collector/cpufreq_linux.go:51 +0x38 github.com/prometheus/node_exporter/collector.execute({0x72858b, 0x7}, {0x83f4a8, 0x40000433a0}, 0x0?, {0x83efc8, 0x40000b7200}) /app/collector/collector.go:161 +0x60 github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0x72858b?, 0x0?}, {0x83f4a8?, 0x40000433a0?}) /app/collector/collector.go:152 +0x3c created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect /app/collector/collector.go:151 +0x98

As a workaround, we can use the master tag image for now.

As a workaround, we can use the master tag image for now.

@uqix , thank you for the suggestion, because I am using prometheus-kube-stack helm, use a master tag version triggerd a validation error, (I am not an experter in helm, don't know how to compress that), so I use the v1.5.0 for the mement.