rfjakob / earlyoom

earlyoom - Early OOM Daemon for Linux

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fatal: could not find entry 'SwapFree:' in /proc/meminfo: Numerical result out of range

oxalica opened this issue · comments

earlyoom exited abnormally with fatal: could not find entry 'SwapFree:' in /proc/meminfo: Numerical result out of range.
I don't know the content of /proc/meminfo at that moment.
When I found it failed, I checked the field is fine like SwapFree: 10296040 kB.
After a restart, it runs normally currently.

Maybe related: #189
cc: @nh2

Systemd log:

Jul 21 18:17:26 invar systemd[1]: Started Early OOM Daemon for Linux.
Jul 21 18:17:26 invar earlyoom[1012]: earlyoom 1.6.1
Jul 21 18:17:26 invar earlyoom[1012]: Notifying through D-Bus
Jul 21 18:17:26 invar earlyoom[1012]: mem total: 13923 MiB, swap total: 16383 MiB
Jul 21 18:17:26 invar earlyoom[1012]: sending SIGTERM when mem <=  5.00% and swap <= 10.00%,
Jul 21 18:17:26 invar earlyoom[1012]:         SIGKILL when mem <=  2.50% and swap <=  5.00%
Jul 27 19:13:10 invar earlyoom[1012]: mem avail:   380 of 13923 MiB ( 2.73%), swap free:  758 of 16383 MiB ( 4.63%)
Jul 27 19:13:10 invar earlyoom[1012]: low memory! at or below SIGTERM limits: mem  5.00%, swap 10.00%
Jul 27 19:13:10 invar earlyoom[1012]: sending SIGTERM to process 18581 uid 1000 "steamwebhelper": badness 302, VmRSS 10 MiB
Jul 27 19:13:10 invar earlyoom[1012]: process exited after 6.9 seconds
Jul 29 11:47:30 invar earlyoom[1012]: mem avail:   546 of 13923 MiB ( 3.93%), swap free:    0 of 16383 MiB ( 0.00%)
Jul 29 11:47:30 invar earlyoom[1012]: low memory! at or below SIGTERM limits: mem  5.00%, swap 10.00%
Jul 29 11:47:30 invar earlyoom[1012]: sending SIGTERM to process 7747 uid 0 ".qemu-system-x8": badness 88, VmRSS 0 MiB
Jul 29 11:47:30 invar earlyoom[1012]: get_entry: strtol() failed: Numerical result out of range
Jul 29 11:47:30 invar earlyoom[1012]: fatal: could not find entry 'SwapFree:' in /proc/meminfo: Numerical result out of range
Jul 29 11:47:31 invar systemd[1]: earlyoom.service: Main process exited, code=exited, status=104/n/a
Jul 29 11:47:31 invar systemd[1]: earlyoom.service: Failed with result 'exit-code'.
Jul 29 11:47:31 invar systemd[1]: earlyoom.service: Consumed 44.501s CPU time, no IP traffic.

Current /proc/meminfo, if helps. (Not the content when broken)

MemTotal:       14257416 kB
MemFree:         2054688 kB
MemAvailable:    3632196 kB
Buffers:          121448 kB
Cached:          1955364 kB
SwapCached:       208452 kB
Active:          6797764 kB
Inactive:        1958864 kB
Active(anon):    5906016 kB
Inactive(anon):  1252120 kB
Active(file):     891748 kB
Inactive(file):   706744 kB
Unevictable:        1524 kB
Mlocked:            1524 kB
SwapTotal:      16777212 kB
SwapFree:       10296040 kB
Dirty:             10264 kB
Writeback:             0 kB
AnonPages:       6657460 kB
Mapped:           786724 kB
Shmem:            480836 kB
KReclaimable:     306924 kB
Slab:            2974024 kB
SReclaimable:     306924 kB
SUnreclaim:      2667100 kB
KernelStack:       28544 kB
PageTables:        89092 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    23905920 kB
Committed_AS:   22172040 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       71516 kB
VmallocChunk:          0 kB
Percpu:            31744 kB
AnonHugePages:      2048 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    12326704 kB
DirectMap2M:     2277376 kB
DirectMap1G:     1048576 kB

I now print out the whole meminfo buffer when something like this happens ( 5db93d9 ).

Can you compile earlyoom from github and see if you can get this to happen again? Would be most helpful to figuring out what is happening here.

@oxalica What's new? Is the problem solved?

@hakavlad

What's new? Is the problem solved?

Now I'm using master branch with buffer dumping 5db93d9 included.
But it haven't happened anymore until now.

Maybe the issue should be closed.

Close for the error dumping in master.
If it still happens, I'll reopen this issue and provide the dump.

I had this happen. SwapFree looks like this:

Feb 07 09:33:27 hostname earlyoom[3620563]: SwapFree:       18446744073709551612 kB

This occurred when earlyoom didn't kill processes fast enough and the kernel oom was getting invoked.

Huh. Did you get the line

fatal error, dumping buffer for later diagnosis

too?

Yes, here's the full output:

Feb 07 09:38:50 hostname earlyoom[2150637]: mem avail:  2854 of 192585 MiB ( 1.48%), swap free:    7 of 8191 MiB ( 0.09%)
Feb 07 09:38:50 hostname earlyoom[2150637]: low memory! at or below SIGKILL limits: mem  4.20%, swap 25.00%
Feb 07 09:38:50 hostname earlyoom[2150637]: get_entry: strtol() failed: Numerical result out of rangeget_entry_fatal: fatal error, dumping buffer for later diagnosis:
Feb 07 09:38:50 hostname earlyoom[2150637]: MemTotal:       197207468 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: MemFree:         3612764 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: MemAvailable:    2787396 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Buffers:            3756 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Cached:          1064988 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: SwapCached:      4046188 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Active:         11477804 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Inactive:       168328284 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Active(anon):   11034348 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Inactive(anon): 167988796 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Active(file):     443456 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Inactive(file):   339488 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Unevictable:     6293060 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Mlocked:         6293060 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: SwapTotal:       8388604 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: SwapFree:       18446744073709551608 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Dirty:              4128 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Writeback:          1004 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: AnonPages:      181027972 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Mapped:          1532288 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Shmem:            285732 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: KReclaimable:     183796 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Slab:             616068 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: SReclaimable:     183796 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: SUnreclaim:       432272 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: KernelStack:       29312 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: PageTables:       486564 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: NFS_Unstable:          0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Bounce:                0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: WritebackTmp:          0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: CommitLimit:    104895184 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Committed_AS:   484636580 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: VmallocTotal:   34359738367 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: VmallocUsed:     1704724 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: VmallocChunk:          0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Percpu:            34944 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: HardwareCorrupted:     0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: AnonHugePages:  94681088 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: ShmemHugePages:        0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: ShmemPmdMapped:        0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: FileHugePages:         0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: FilePmdMapped:         0 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: HugePages_Total:    2048
Feb 07 09:38:50 hostname earlyoom[2150637]: HugePages_Free:     2048
Feb 07 09:38:50 hostname earlyoom[2150637]: HugePages_Rsvd:        0
Feb 07 09:38:50 hostname earlyoom[2150637]: HugePages_Surp:        0
Feb 07 09:38:50 hostname earlyoom[2150637]: Hugepagesize:       2048 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: Hugetlb:         4194304 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: DirectMap4k:     1689208 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: DirectMap2M:    148977664 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: DirectMap1G:    52428800 kB
Feb 07 09:38:50 hostname earlyoom[2150637]: fatal: could not find entry 'SwapFree:' in /proc/meminfo: Numerical result out of range
Feb 07 09:38:50 hostname systemd[1]: earlyoom.service: Main process exited, code=exited, status=104/n/a
Feb 07 09:38:50 hostname systemd[1]: earlyoom.service: Failed with result 'exit-code'.
Feb 07 09:38:50 hostname systemd[1]: earlyoom.service: Service RestartSec=100ms expired, scheduling restart.
Feb 07 09:38:50 hostname systemd[1]: earlyoom.service: Scheduled restart job, restart counter is at 5.
Feb 07 09:38:50 hostname systemd[1]: Stopped Early OOM Daemon.
Feb 07 09:38:50 hostname systemd[1]: Started Early OOM Daemon.
18446744073709551612 = -4 =  EINTR            4      /* Interrupted system call */

Wow, looks like a kernel bug. What is your kernel version (uname -a)?

This is a rocky 8.8 machine with kernel 4.18.0-477.15.1.el8_8.x86_64

@oxalica did you also run a RHEL8 kernel when this happened?

@oxalica did you also run a RHEL8 kernel when this happened?

No. I was using NixOS and is still using it now. The issue does not reproduce in the last year for me since journalctl -u earlyoom.service --grep='fatal' returns no lines.

Sorry but I don't know the exact kernel version when I encountered this issue. But I guess it's the latest LTS kernel at that time, which was about 5.4.

Workarounded in earlyoom via edd3d94.