Kills multiple processes at once
haarp opened this issue · comments
Pretty much every time earlyoom triggers, it tends to kill multiple processes within the same second.
Example (running with -p -m 5,2 -s 100,100 -N logger
:
Feb 27 09:34:06 host root[7509]: dialog-warning earlyoom Low memory! Killing process 32194 vmware_vmx
Feb 27 09:34:06 host root[7511]: dialog-warning earlyoom Low memory! Killing process 26773 firefox
Each of these processes would've been enough to free plenty of memory on their own. Wouldn't it be more sensible to minimize damage by killing one process first and waiting just a little to see if the situation improved, before moving on to the next process?
@haarp Earlyoom sleeps 1 sec after each corrective action (main.c:375):
// With swap enabled, the kernel seems to need more than 100ms to free the memory
// of the killed process. This means that earlyoom would immediately kill another
// process. Sleep a little extra to give the kernel time to free the memory.
// (Yes, this will sleep even if the kill has failed. Does no harm and keeps the
// code simple.)
if (m.SwapTotalMiB > 0) {
usleep(cooldown_ms * 1000);
You can fix sources and increase sleep duration. Or just try to use nohang
(it has config with configurable sleep durations).
Same problem:
Thanks. I've been running earlyoom but when I overcommit memory a lot, it will start to be too trigger happy.
Maybe increasing sleep times after corrective actions is good idea.
killdelay.sh
#!/bin/bash
set -eu
PID=$1
kill $PID
for i in $(seq 1 20) ; do
grep MemAvailable /proc/meminfo
sleep 0.1
done
Looks like it takes > 1s to return the 18GB to MemAvailable:
$ sudo ./killdelay.sh 10030
(interval 0.1s)
MemAvailable: 3466280 kB
MemAvailable: 3466280 kB
MemAvailable: 3466280 kB
MemAvailable: 3466280 kB
MemAvailable: 3466036 kB
MemAvailable: 3466036 kB
MemAvailable: 3466036 kB
MemAvailable: 3461248 kB
MemAvailable: 3461248 kB
MemAvailable: 3461248 kB
MemAvailable: 5322496 kB
MemAvailable: 8029712 kB
MemAvailable: 10724496 kB
MemAvailable: 13429832 kB
MemAvailable: 16145336 kB
MemAvailable: 18859064 kB
MemAvailable: 21544860 kB
MemAvailable: 22372484 kB
MemAvailable: 22372596 kB
MemAvailable: 22372632 kB
Looks like we can detect this situation. Again, in 0.1 seconds interval. We see that /proc/[pid]/statm returns all zeros while the kernel is cleaning up the process. When /proc/[pid] disappears, all the memory has been returned to the system.
1 0 0 MemAvailable: 3257276 kB VmRSS: 18875824 kB statm: 4719170 4718956 351 4 0 4718670 0
1 0 0 MemAvailable: 3257276 kB VmRSS: 18875824 kB statm: 4719170 4718956 351 4 0 4718670 0
1 0 0 MemAvailable: 3256400 kB VmRSS: 18875824 kB statm: 4719170 4718956 351 4 0 4718670 0
***sigkill***
0 0 0 MemAvailable: 3256852 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3256836 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3257136 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3263624 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3263832 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3264032 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3264032 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3264032 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 3264020 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 5299464 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 8073872 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 10852200 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 13666400 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 16456496 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 19241276 kB statm: 0 0 0 0 0 0 0
0 0 0 MemAvailable: 21993804 kB statm: 0 0 0 0 0 0 0
***/proc/pid disappears***
0 1 1 MemAvailable: 22172808 kB statm:
0 1 1 MemAvailable: 22172920 kB statm:
Fixed in earlyoom v1.3-rc1. We now wait for the process to release all memory before checking to to see if we should kill another one.
Nice, thank you!
mem avail: 1863 of 13855 MiB (13 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 1676 of 13855 MiB (12 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 1523 of 13855 MiB (10 %), swap free: 0 of 0 MiB ( 0 %)
low memory! at or below SIGTERM limits: mem 10 %, swap 10 %
killing process 27366 "bash" with signal 15: badness 539, VmRSS 7464 MiB
timeout waiting for pid 27366 after signal 15
kill failed: Success. Sleeping 1 second
mem avail: 8 of 13855 MiB ( 0 %), swap free: 0 of 0 MiB ( 0 %)
low memory! at or below SIGKILL limits: mem 5 %, swap 5 %
killing process 27366 "bash" with signal 9: badness 678, VmRSS 9380 MiB
process 27366 exited after 0.3 seconds
mem avail: 9402 of 13855 MiB (67 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 9399 of 13855 MiB (67 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 9392 of 13855 MiB (67 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 9387 of 13855 MiB (67 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 8 of 13855 MiB ( 0 %), swap free: 0 of 0 MiB ( 0 %)
How do you like it?
kill failed: Success.
What does it mean?
mem avail: 1860 of 13855 MiB (13 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 1662 of 13855 MiB (11 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 1524 of 13855 MiB (10 %), swap free: 0 of 0 MiB ( 0 %)
low memory! at or below SIGTERM limits: mem 10 %, swap 10 %
killing process 27541 "bash" with signal 15: badness 550, VmRSS 7609 MiB
process 27541 exited after 1.9 seconds
mem avail: 4626 of 13855 MiB (33 %), swap free: 0 of 0 MiB ( 0 %)
mem avail: 1686 of 13855 MiB (12 %), swap free: 0 of 0 MiB ( 0 %)
апр 28 01:23:30 PC kernel: Out of memory: Kill process 27541 (bash) score 572 or sacrifice child
апр 28 01:23:30 PC kernel: Killed process 27541 (bash) total-vm:8118236kB, anon-rss:8098448kB, file-rss:3264kB, shmem-rss:0kB
process 27541 exited after 1.9 seconds
Because it was killed by OOMK, not earlyoom.
Nice. I see you are blocking SIGTERM. How did you get bash to 7609 MiB of memory?
for i in {1..55555555}; do true; done
You can change the number (55555555) to your liking
Both the broken error message kill failed: Success
and the problem that programs that block SIGTERM can delay earlyoom for 10 seconds should be fixed now, thanks.