rfjakob / earlyoom

earlyoom - Early OOM Daemon for Linux

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

earlyoom will terminate itself

jcaesar opened this issue · comments

mem avail:    21 of   950 MiB ( 2.27%), swap free:   81 of 1023 MiB ( 7.95%)
low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
sending SIGTERM to process 397 uid 61876 "earlyoom": badness 0, VmRSS 1 MiB

Maybe this should be prevented by default? (Apologies if it is already, running earlyoom 1.6-1 from archlinux.)

Hmm, how does this happen? Is earlyoom really the process using most memory on this machine?

Probably. There's not much else running that's not on ignore.

Config:

EARLYOOM_ARGS="-r 60 -s 15,5 -m 10,5 -p -n --avoid '(^|/)(init|systemd.*|Xorg|sshd)$'"

More log:

Jun 12 06:38:31: mem avail:   607 of   950 MiB (63.93%), swap free: 1009 of 1023 MiB (98.61%)
Jun 12 06:39:16: mem avail:    19 of   950 MiB ( 2.10%), swap free:  153 of 1023 MiB (14.96%)
Jun 12 06:39:16: low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
Jun 12 06:39:16: sending SIGTERM to process 696 uid 1001 "ipfs": badness 757, VmRSS 637 MiB
Jun 12 06:39:26: kill failed: Timer expired
Jun 12 06:39:26: mem avail:    24 of   950 MiB ( 2.58%), swap free:   89 of 1023 MiB ( 8.76%)
Jun 12 06:39:26: low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
Jun 12 06:39:26: sending SIGTERM to process 696 uid 1001 "ipfs": badness 799, VmRSS 656 MiB
Jun 12 06:39:26: process exited after 0.1 seconds
Jun 12 06:39:26: mem avail:    23 of   950 MiB ( 2.47%), swap free:   87 of 1023 MiB ( 8.59%)
Jun 12 06:39:26: low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
Jun 12 06:39:26: sending SIGTERM to process 324 uid 0 "rngd": badness 1, VmRSS 1 MiB
Jun 12 06:39:27: process exited after 0.5 seconds
Jun 12 06:39:27: mem avail:    23 of   950 MiB ( 2.45%), swap free:   83 of 1023 MiB ( 8.13%)
Jun 12 06:39:27: low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
Jun 12 06:39:27: sending SIGTERM to process 396 uid 81 "dbus-daemon": badness 0, VmRSS 2 MiB
Jun 12 06:39:27: process exited after 0.2 seconds
Jun 12 06:39:27: mem avail:    21 of   950 MiB ( 2.27%), swap free:   81 of 1023 MiB ( 7.95%)
Jun 12 06:39:27: low memory! at or below SIGTERM limits: mem 10.00%, swap 15.00%
Jun 12 06:39:27: sending SIGTERM to process 397 uid 61876 "earlyoom": badness 0, VmRSS 1 MiB

Could you provide the output with the -d option, please?

I'm looking at the log, and something is going wrong before. Killing ipfs should have freed 656 megabytes, but the amount of available mem and swap did not change!

It seems that the ipfs process has become invisible for earlyoom. earlyoom is trying to kill other processes, although it seems that the ipfs process has not yet terminated.

Maybe ipfs became a zombie and has not yet freed his memory, but earlyoom considers it already dead.

I guess we should wait until we see available memory rise

Could you provide the output with the -d option, please?

Unlikely that I can reproduce the problem. I've since circumvented by adding more swap, and running gc/compaction once. Now, the usage doesn't go quite as high any more.
And even before, this wasn't a "happens every time" thing.

I tried hard to reproduce this behavoir, on my PC, on a VM, and also a Raspberry Pi 4. But I always get the process exited after x seconds AFTER mem avail got back to normal. Example with -d:

sending SIGTERM to process 6428 uid 1001 "membomb": badness 870, VmRSS 1635 MiB
mem avail:    31 of  1848 MiB ( 1.71%), swap free:   49 of  499 MiB (10.00%)
process state: D
mem avail:    31 of  1848 MiB ( 1.71%), swap free:   49 of  499 MiB ( 9.85%)
process state: D
mem avail:    31 of  1848 MiB ( 1.72%), swap free:   47 of  499 MiB ( 9.50%)
process state: D
mem avail:    31 of  1848 MiB ( 1.69%), swap free:   45 of  499 MiB ( 9.20%)
process state: D
mem avail:    31 of  1848 MiB ( 1.71%), swap free:   44 of  499 MiB ( 8.95%)
process state: R
mem avail:    31 of  1848 MiB ( 1.71%), swap free:   43 of  499 MiB ( 8.80%)
process state: D
mem avail:    31 of  1848 MiB ( 1.71%), swap free:   42 of  499 MiB ( 8.50%)
process state: D
mem avail:    31 of  1848 MiB ( 1.69%), swap free:   40 of  499 MiB ( 8.20%)
process state: D
mem avail:    31 of  1848 MiB ( 1.70%), swap free:   39 of  499 MiB ( 8.00%)
process state: D
mem avail:    31 of  1848 MiB ( 1.73%), swap free:   28 of  499 MiB ( 5.65%)
process state: D
mem avail:    28 of  1848 MiB ( 1.57%), swap free:   24 of  499 MiB ( 4.85%)
escalating to SIGKILL after 1.0 seconds
process state: D
mem avail:    32 of  1848 MiB ( 1.78%), swap free:   21 of  499 MiB ( 4.35%)
process state: D
mem avail:    30 of  1848 MiB ( 1.63%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    30 of  1848 MiB ( 1.63%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    30 of  1848 MiB ( 1.63%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    30 of  1848 MiB ( 1.63%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    32 of  1848 MiB ( 1.77%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    32 of  1848 MiB ( 1.77%), swap free:   12 of  499 MiB ( 2.45%)
process state: D
mem avail:    32 of  1848 MiB ( 1.77%), swap free:   18 of  499 MiB ( 3.75%)
process state: R
mem avail:   208 of  1848 MiB (11.28%), swap free:  403 of  499 MiB (80.75%)
process state: R
mem avail:   984 of  1848 MiB (53.29%), swap free:  403 of  499 MiB (80.75%)
process state: R
mem avail:  1643 of  1848 MiB (88.94%), swap free:  403 of  499 MiB (80.80%)
process exited after 2.1 seconds

Do you want to give me the source for membomb? I can try to reproduce this on something more like the original environment (An EC2 t3.micro running ami-0fcf6df96bd1453cc with a swap file on a 50 GB gp2 EBS vol.)

I reported this because I thought it would probably be trivial to add a check against getpid. It seems you already have, but that didn't help in this situation. Hmm.

membomb is in contrib, but the easiest is just running

tail /dev/zero

Would be nice if you can reprodruce this, and then I'll maybe have to learn how to use aws :)

The getpid check we have is only for a specific case (hidepid), so it did not trigger here.

My reasoning for not preventing terminating ourselves is: what if there's a memory leak in earlyoom?

what if there's a memory leak in earlyoom?

Set in earlyoom.service, for example:

TasksMax=10
MemoryMax=50M
MemorySwapMax=0

#207

It doesn't seem to be reproducible with either membomb nor tail /dev/zero. I even tried blocking the membomb process by raise(SIGSTOP) or opening a file on an sshfs that has been frozen by pkil -STOP sftp-server in the SIGTERM handler. earlyoom tries to continuously terminate/kill the wrong process in that case, but as it doesn't succeed, it doesn't switch to killing itself. I'm not sure what exaclty go-ipfs does that will keep using memory after the process exited. :/

Is it possible that this happened because parse_meminfo ran and earlyoom detected oom, but the large process terminated itself right that moment, leaving kill_largest_process with no fodder? (I'm not sure how this could possibly happen multiple times on the same machine…)

Hm, yes, sounds plausible.

@rfjakob Do you have any ideas how to fix it?

One could check whether the memory situation is still dire after picking a process.

But still, what earlyoom does is simply not possible to do entirely race-free, right? (With the larger pid space on newer kernels, it's certainly less troubling, though.)
Btw, is the linux kernel oom killer race free? Could this be done race-free if earlyoom was a kernel module? (Just curious, earlyoom actually being a kernel module is probably a bad idea.)

One could check whether the memory situation is still dire after picking a process.

Yes, that's what I was thinking about as well. Because, if we look at the stats, kill_largest_process takes about 400x longer than parse_meminfo (check how free memory is doing). That 400x number is roughly the number of processes you have running, so it will be higher with lots of processes, because earlyoom has to look at each of them.

$ make bench
go test -run=NONE -bench=.
goos: linux
goarch: amd64
pkg: github.com/rfjakob/earlyoom
Benchmark_parse_meminfo-4          	  199910	      6511 ns/op
Benchmark_kill_largest_process-4   	     534	   2468413 ns/op
Benchmark_get_oom_score-4          	  195270	      5567 ns/op
Benchmark_get_oom_score_adj-4      	  228506	      5071 ns/op
Benchmark_get_vm_rss_kib-4         	  211107	      5487 ns/op
Benchmark_get_comm-4               	  162548	      6702 ns/op
PASS
ok  	github.com/rfjakob/earlyoom	7.660s

I think the kernel oom killer is race-free because it freezes all processes (or freezes all allocations?)