stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No cpu frequency information in grafana

sjpb opened this issue · comments

Ticket: https://stackhpc.atlassian.net/browse/DEV-1017

Looks like cpu and cpufreq are already in environments/common/inventory/group_vars/all/prometheus.yml though.

Seen on alaska (on arcus), reproduced on smslabs

Reproduced on arcus. This kernel module doesn't exist:

[root@dev-compute-0 rocky]# ls /lib/modules/$(uname -r)/kernel/arch/x86/kernel/cpu/cpufreq/

tried

 yum install cpupowerutils
[root@dev-compute-0 rocky]# find /lib/modules -type f -iname "*freq*"
/lib/modules/4.18.0-348.el8.0.2.x86_64/kernel/drivers/cpufreq/acpi-cpufreq.ko.xz
/lib/modules/4.18.0-348.el8.0.2.x86_64/kernel/drivers/cpufreq/amd_freq_sensitivity.ko.xz
/lib/modules/4.18.0-348.23.1.el8_5.x86_64/kernel/drivers/cpufreq/acpi-cpufreq.ko.xz
/lib/modules/4.18.0-348.23.1.el8_5.x86_64/kernel/drivers/cpufreq/amd_freq_sensitivity.ko.xz
[root@dev-compute-0 rocky]# modprobe acpi-cpufreq
modprobe: ERROR: could not insert 'acpi_cpufreq': No such device

ohpc dashboard uses node_cpu_scaling_frequency_hertz stats

Based on https://superuser.com/questions/1624080/why-there-is-no-cpufreq-under-sys-devices-system-cpu-cpu0

[rocky@cpuinfo-compute-0 ~]$ curl http://localhost:9100/metrics | grep node_cpu_scaling_frequency_hertz
<nothing>

[rocky@cpuinfo-compute-0 ~]$ cat /boot/config-$(uname -r) | grep CONFIG_CPU_FREQ
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

[rocky@cpuinfo-compute-0 ~]$ cat /boot/config-$(uname -r) | grep CONFIG_X86_ACPI_CPUFREQ
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y

[rocky@cpuinfo-compute-0 ~]$ cat /boot/config-$(uname -r) | grep CONFIG_X86_INTEL_PSTATE
CONFIG_X86_INTEL_PSTATE=y