[Bug]: Temperature alerts break in v1.45
kmlucy opened this issue · comments
Bug description
Several of my alerts for temperatures break in v1.45. The temperature the alert is reading is normal, but the alert itself is an order of magnitude high:
v1.44.3 works as expected, v.45.3 has this issue.
Expected behavior
Alerts get correct values from charts
Steps to reproduce
- Create alerts as shown below
- Update to v1.45
Installation method
docker
System info
Linux orca 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
/etc/os-release:PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
/etc/os-release:NAME="Debian GNU/Linux"
/etc/os-release:VERSION_ID="12"
/etc/os-release:VERSION="12 (bookworm)"
/etc/os-release:VERSION_CODENAME=bookworm
/etc/os-release:ID=debian
Netdata build info
Packaging:
Netdata Version ____________________________________________ : v1.45.3
Installation Type __________________________________________ : oci
Package Architecture _______________________________________ : x86_64
Package Distro _____________________________________________ : unknown
Configure Options __________________________________________ : dummy-configure-command
Default Directories:
User Configurations ________________________________________ : /etc/netdata
Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
Permanent Databases ________________________________________ : /var/lib/netdata
Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
Static Web Files ___________________________________________ : /usr/share/netdata/web
Log Files __________________________________________________ : /var/log/netdata
Lock Files _________________________________________________ : /var/lib/netdata/lock
Home _______________________________________________________ : /var/lib/netdata
Operating System:
Kernel _____________________________________________________ : Linux
Kernel Version _____________________________________________ : 6.1.0-20-amd64
Operating System ___________________________________________ : Debian GNU/Linux
Operating System ID ________________________________________ : debian
Operating System ID Like ___________________________________ : unknown
Operating System Version ___________________________________ : 12 (bookworm)
Operating System Version ID ________________________________ : 12
Detection __________________________________________________ : /host/etc/os-release
Hardware:
CPU Cores __________________________________________________ : 24
CPU Frequency ______________________________________________ : 2267000000
RAM Bytes __________________________________________________ : 16779579392
Disk Capacity ______________________________________________ : 124130968092672
CPU Architecture ___________________________________________ : x86_64
Virtualization Technology __________________________________ : none
Virtualization Detection ___________________________________ : none
Container:
Container __________________________________________________ : docker
Container Detection ________________________________________ : dockerenv
Container Orchestrator _____________________________________ : none
Container Operating System _________________________________ : Debian GNU/Linux
Container Operating System ID ______________________________ : debian
Container Operating System ID Like _________________________ : unknown
Container Operating System Version _________________________ : 12 (bookworm)
Container Operating System Version ID ______________________ : 12
Container Operating System Detection _______________________ : /etc/os-release
Features:
Built For __________________________________________________ : Linux
Netdata Cloud ______________________________________________ : YES
Health (trigger alerts and send notifications) _____________ : YES
Streaming (stream metrics to parent Netdata servers) _______ : YES
Back-filling (of higher database tiers) ____________________ : YES
Replication (fill the gaps of parent Netdata servers) ______ : YES
Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
Contexts (index all active and archived metrics) ___________ : YES
Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
Machine Learning ___________________________________________ : YES
Database Engines:
dbengine ___________________________________________________ : YES
alloc ______________________________________________________ : YES
ram ________________________________________________________ : YES
none _______________________________________________________ : YES
Connectivity Capabilities:
ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
static (Netdata internal web server) _______________________ : YES
h2o (web server) ___________________________________________ : YES
WebRTC (experimental) ______________________________________ : NO
Native HTTPS (TLS Support) _________________________________ : YES
TLS Host Verification ______________________________________ : YES
Libraries:
LZ4 (extremely fast lossless compression algorithm) ________ : YES
ZSTD (fast, lossless compression algorithm) ________________ : YES
zlib (lossless data-compression library) ___________________ : YES
Brotli (generic-purpose lossless compression algorithm) ____ : NO
protobuf (platform-neutral data serialization protocol) ____ : YES (system)
OpenSSL (cryptography) _____________________________________ : YES
libdatachannel (stand-alone WebRTC data channels) __________ : NO
JSON-C (lightweight JSON manipulation) _____________________ : YES
libcap (Linux capabilities system operations) ______________ : NO
libcrypto (cryptographic functions) ________________________ : YES
libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
apps (monitor processes) ___________________________________ : YES
cgroups (monitor containers and VMs) _______________________ : YES
cgroup-network (associate interfaces to CGROUPS) ___________ : YES
proc (monitor Linux systems) _______________________________ : YES
tc (monitor Linux network QoS) _____________________________ : YES
diskspace (monitor Linux mount points) _____________________ : YES
freebsd (monitor FreeBSD systems) __________________________ : NO
macos (monitor MacOS systems) ______________________________ : NO
statsd (collect custom application metrics) ________________ : YES
timex (check system clock synchronization) _________________ : YES
idlejitter (check system latency and jitter) _______________ : YES
bash (support shell data collection jobs - charts.d) _______ : YES
debugfs (kernel debugging metrics) _________________________ : YES
cups (monitor printers and print jobs) _____________________ : NO
ebpf (monitor system calls) ________________________________ : NO
freeipmi (monitor enterprise server H/W) ___________________ : YES
nfacct (gather netfilter accounting) _______________________ : NO
perf (collect kernel performance events) ___________________ : YES
slabinfo (monitor kernel object caching) ___________________ : YES
Xen ________________________________________________________ : NO
Xen VBD Error Tracking _____________________________________ : NO
Logs Management ____________________________________________ : YES
Exporters:
AWS Kinesis ________________________________________________ : NO
GCP PubSub _________________________________________________ : NO
MongoDB ____________________________________________________ : YES
Prometheus (OpenMetrics) Exporter __________________________ : YES
Prometheus Remote Write ____________________________________ : YES
Graphite ___________________________________________________ : YES
Graphite HTTP / HTTPS ______________________________________ : YES
JSON _______________________________________________________ : YES
JSON HTTP / HTTPS __________________________________________ : YES
OpenTSDB ___________________________________________________ : YES
OpenTSDB HTTP / HTTPS ______________________________________ : YES
All Metrics API ____________________________________________ : YES
Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
Trace All Netdata Allocations (with charts) ________________ : NO
Developer Mode (more runtime checks, slower) _______________ : NO
Additional info
Alert configs:
template: core_temp
on: sensors.temperature
class: Errors
type: System
component: CPU
os: linux freebsd
hosts: *
lookup: average -10s unaligned foreach *Core*,*Package*
units: degrees celcius
every: 10s
warn: $this > (($status >= $WARNING) ? (63) : (72))
crit: $this > (($status == $CRITICAL) ? (72) : (81))
summary: CPU temperature
info: average CPU temperature over last 10 seconds
to: sysadmin
template: disk_temp
on: hddtemp.temperatures
class: Errors
type: System
component: Disk
os: linux freebsd
hosts: *
lookup: average -10s unaligned foreach *
units: degrees celcius
every: 10s
warn: $this > (($status >= $WARNING) ? (40) : (42))
crit: $this > (($status == $CRITICAL) ? (42) : (44))
summary: HDD temperature
info: average disk temperature over last 10 seconds
to: sysadmin
Hi, @kmlucy. The foreach
option was deprecated. It was mentioned in the v1.45.0 deprecation notice.
Added "wontfix" because there is no problem with the health engine.
Collectors should be updated - create a chart per instance instead of a dimension per instance.
Added go.d/hddtemp and go.d/sensors: both create a chart per instance. foreach
is not needed, filtering/selection can be applied using chart labels
.
@ilyam8 Thank you for updating those so quickly. Is it possible to do the same thing for smartd? It also creates a dimension per drive instead of a chart.
Yes, we need to rewrite smartd as well. I am thinking of dropping reading smartd logs instead of just executing the binary and parsing the response.
@ilyam8 Thank you for updating those so quickly. Is it possible to do the same thing for smartd? It also creates a dimension per drive instead of a chart.
Smartd was rewritten too - go.d/smartctl.
Thank you for fixing that so quickly. These updates will come in v1.46?
Yes.