netdata / netdata

The open-source observability platform everyone needs!

Home Page:https://www.netdata.cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: 100% CPU Usage

Floyz opened this issue · comments

Bug description

Netdata is using 100% of 1 core on my CPU causing temp to raise

Expected behavior

not consuming 100% CPU

Steps to reproduce

Start the netdata agent
Capture d'écran 2024-04-08 133837
Capture d'écran 2024-04-08 131907

Installation method

kickstart.sh

System info

5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux
/etc/os-release:PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
/etc/os-release:NAME="Debian GNU/Linux"
/etc/os-release:VERSION_ID="11"
/etc/os-release:VERSION="11 (bullseye)"
/etc/os-release:VERSION_CODENAME=bullseye
/etc/os-release:ID=debian

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.45.0-113-nightly
    Installation Type __________________________________________ : binpkg-deb
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ :
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /var/lib/netdata/www
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 5.10.0-18-amd64
    Operating System ___________________________________________ : Debian GNU/Linux
    Operating System ID ________________________________________ : debian
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : 11 (bullseye)
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ :
    CPU Frequency ______________________________________________ : 2800000000
    RAM Bytes __________________________________________________ : 33279152128
    Disk Capacity ______________________________________________ : 3024608477184
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : none
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine (compression) _____________________________________ : YES (zstd lz4)
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Brotli (generic-purpose lossless compression algorithm) ____ : NO
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : YES
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : YES
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : YES
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

No response

Hi @Floyz , thanks for reporting. Can you please do the following

  • Stop the Netdata agent (make sure it is stopped)
  • Run sudo /usr/sbin/netdata -W sqlite-analyze
  • Start the Netdata agent

And check if the issue reappears?

Hello !
I did your steps, started the service, then issue started to show ( to be noted: the issue seems to start around one minute after service start, not immediatly)

I also tried to clean the cache and stop/restart service before openning the ticket but no effect ( rm -rf var/cache/netdata/* )

Hi @Floyz , when this occurs would it be possible to stop the agent and submit /var/cache/netdata/netdata-meta.db to stelios@netdata.cloud for investigation?

Hello ! just shared the file through wetransfer to the provided mail :)
Regards

Hello ! just shared the file through wetransfer to the provided mail :) Regards

We will start the investigation, thank you!

Hi @Floyz, can you update to the latest nightly (if not already updated) then please do the following

  • Stop the agent
  • Run sudo /usr/sbin/netdata -W sqlite-alert-cleanup
  • Restart the agent

And see if that resolves the issue

Hi @Floyz, can you update to the latest nightly (if not already updated) then please do the following

  • Stop the agent
  • Run sudo /usr/sbin/netdata -W sqlite-alert-cleanup
  • Restart the agent

And see if that resolves the issue

Hi @Floyz , did you have a chance to test this ?