netdata / netdata

The open-source observability platform everyone needs!

Home Page:https://www.netdata.cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Temperature alerts break in v1.45

kmlucy opened this issue · comments

Bug description

Several of my alerts for temperatures break in v1.45. The temperature the alert is reading is normal, but the alert itself is an order of magnitude high:
image

v1.44.3 works as expected, v.45.3 has this issue.

Expected behavior

Alerts get correct values from charts

Steps to reproduce

  1. Create alerts as shown below
  2. Update to v1.45

Installation method

docker

System info

Linux orca 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
/etc/os-release:PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
/etc/os-release:NAME="Debian GNU/Linux"
/etc/os-release:VERSION_ID="12"
/etc/os-release:VERSION="12 (bookworm)"
/etc/os-release:VERSION_CODENAME=bookworm
/etc/os-release:ID=debian

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.45.3
    Installation Type __________________________________________ : oci
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /usr/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.1.0-20-amd64
    Operating System ___________________________________________ : Debian GNU/Linux
    Operating System ID ________________________________________ : debian
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : 12 (bookworm)
    Operating System Version ID ________________________________ : 12
    Detection __________________________________________________ : /host/etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 24
    CPU Frequency ______________________________________________ : 2267000000
    RAM Bytes __________________________________________________ : 16779579392
    Disk Capacity ______________________________________________ : 124130968092672
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : none
    Virtualization Detection ___________________________________ : none
Container:
    Container __________________________________________________ : docker
    Container Detection ________________________________________ : dockerenv
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : Debian GNU/Linux
    Container Operating System ID ______________________________ : debian
    Container Operating System ID Like _________________________ : unknown
    Container Operating System Version _________________________ : 12 (bookworm)
    Container Operating System Version ID ______________________ : 12
    Container Operating System Detection _______________________ : /etc/os-release
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Brotli (generic-purpose lossless compression algorithm) ____ : NO
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : NO
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : NO
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : YES
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

Alert configs:

 template: core_temp
       on: sensors.temperature
    class: Errors
     type: System
component: CPU
       os: linux freebsd
    hosts: *
   lookup: average -10s unaligned foreach *Core*,*Package*
    units: degrees celcius
    every: 10s
     warn: $this > (($status >= $WARNING) ? (63) : (72))
     crit: $this > (($status == $CRITICAL) ? (72) : (81))
  summary: CPU temperature
     info: average CPU temperature over last 10 seconds
       to: sysadmin

 template: disk_temp
       on: hddtemp.temperatures
    class: Errors
     type: System
component: Disk
       os: linux freebsd
    hosts: *
   lookup: average -10s unaligned foreach *
    units: degrees celcius
    every: 10s
     warn: $this > (($status >= $WARNING) ? (40) : (42))
     crit: $this > (($status == $CRITICAL) ? (42) : (44))
  summary: HDD temperature
     info: average disk temperature over last 10 seconds
       to: sysadmin

Hi, @kmlucy. The foreach option was deprecated. It was mentioned in the v1.45.0 deprecation notice.

Added "wontfix" because there is no problem with the health engine.

Collectors should be updated - create a chart per instance instead of a dimension per instance.

Added go.d/hddtemp and go.d/sensors: both create a chart per instance. foreach is not needed, filtering/selection can be applied using chart labels.

@ilyam8 Thank you for updating those so quickly. Is it possible to do the same thing for smartd? It also creates a dimension per drive instead of a chart.

Yes, we need to rewrite smartd as well. I am thinking of dropping reading smartd logs instead of just executing the binary and parsing the response.

@ilyam8 Thank you for updating those so quickly. Is it possible to do the same thing for smartd? It also creates a dimension per drive instead of a chart.

Smartd was rewritten too - go.d/smartctl.

Thank you for fixing that so quickly. These updates will come in v1.46?