Memory not released under pressure
mtrower opened this issue · comments
Hi, I wasn't sure whether to file this against ZFS or the SPL, but the SPL has very few issues filed, so here I am.
After some heavy activity on a pool (intensive file creation and listing) I'm seeing a wired memory consumption of 14.07GB, of which 10.5GB appears to be consumed by ZFS:
% sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 11457265664
Which is all well and good, but even under pressure it wasn't dropping, so I tried to constrain the ARC:
kstat.zfs.darwin.tunable.zfs_arc_max: 0 -> 4294967296
kstat.zfs.darwin.tunable.zfs_arc_meta_limit: 0 -> 3221225472
kstat.zfs.darwin.tunable.zfs_arc_min: 0 -> 1610612736
kstat.zfs.darwin.tunable.zfs_arc_meta_min: 0 -> 1342177280
kstat.zfs.darwin.tunable.zfs_dirty_data_max: 1717986918 -> 536870912
ARC now looks like this:
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
13:44:04 122M 23M 18.9 466K 0.7 22M 39.1 23M 18.9 2515M 4294M
13:44:05 0 0 0 0 0 0 0 0 0 2515M 4294M
13:44:06 0 0 0 0 0 0 0 0 0 2515M 4294M
but the SPL isn't dropping with it. No matter; let's apply some pressure and see if it releases.
sudo memory_pressure -l warn -s 8
App Memory consumption slowly climbs over the next 5-10 minutes. Pressure rises, and Wired Memory does not drop. Eventually, Compressed shoots through the roof (8GB or so), and we finally hit "warn", where I hold it for a while to observe.
We can see that the ARC releases memory at a few points:
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
13:56:46 0 0 0 0 0 0 0 0 0 2535M 4294M
13:56:47 0 0 0 0 0 0 0 0 0 2535M 4294M
13:56:48 0 0 0 0 0 0 0 0 0 2534M 2534M
13:56:49 0 0 0 0 0 0 0 0 0 2534M 2534M
...
13:57:32 0 0 0 0 0 0 0 0 0 2512M 2512M
13:57:34 0 0 0 0 0 0 0 0 0 2507M 2507M
13:57:35 0 0 0 0 0 0 0 0 0 2487M 2487M
13:57:36 0 0 0 0 0 0 0 0 0 1390M 1610M
13:57:37 0 0 0 0 0 0 0 0 0 1390M 1610M
13:57:38 0 0 0 0 0 0 0 0 0 1353M 1610M
But the SPL remains iron-fisted.
kstat.spl.misc.spl_misc.os_mem_alloc: 11457265664
Alright; let's just export the pool entirely (no pools imported), and check the ARC again:
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
00:44:09 122M 23M 18.9 468K 0.7 22M 39.1 23M 18.9 44M 1610M
00:44:10 0 0 0 0 0 0 0 0 0 44M 1610M
And the SPL...
kstat.spl.misc.spl_misc.os_mem_alloc: 10613424128
So this time, the SPL dropped by the amount of ARC freed. What the heck is it doing with the rest?
Let's try applying putting the squeeze on again
% sudo memory_pressure -l warn
Memory looks like this:
kstat.spl.misc.spl_misc.os_mem_alloc: 3674210304
We've released a lot, but we're still holding on to >3GB with no pools imported?
Let's repeat from the start (sort of; I'm not rebooting, nor relaxing the ARC constraints). Import pool; do some work:
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
01:02:16 58K 13K 23.7 215 0.7 13K 45.8 13K 23.7 4288M 4294M
01:02:17 59K 13K 22.6 214 0.7 13K 43.7 13K 22.6 4250M 4294M
kstat.spl.misc.spl_misc.os_mem_alloc: 6923747328
Seems reasonable. Apply pressure: takes forever again (>10m to hit WARN at a final pressure of 65%). For most of that time, memory consumption climbed very slowly with pressure holding around ~35%. It's as if memory_pressure
is struggling to find pages to allocate, even though pressure is ostensibly low.
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
01:14:35 159M 30M 19.4 616K 0.7 30M 39.6 30M 19.4 4161M 4294M
01:14:36 0 0 0 0 0 0 0 0 0 4161M 4294M
...
01:24:59 0 0 0 0 0 0 0 0 0 4059M 4059M
01:25:00 0 0 0 0 0 0 0 0 0 3974M 4059M
01:25:01 0 0 0 0 0 0 0 0 0 3974M 4059M
01:25:02 0 0 0 0 0 0 0 0 0 3974M 4059M
01:25:03 0 0 0 0 0 0 0 0 0 2318M 1610M
01:25:04 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:05 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:06 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:07 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:08 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:09 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:10 0 0 0 0 0 0 0 0 0 1001M 1610M
01:25:11 0 0 0 0 0 0 0 0 0 882M 1610M
And SPL has barely budged:
kstat.spl.misc.spl_misc.os_mem_alloc: 6909067264
Finally, I reversed all of the tuning (thinking maybe the ARC minimums were causing SPL to hold memory)
kstat.zfs.darwin.tunable.zfs_arc_max: 4294967296 -> 0
kstat.zfs.darwin.tunable.zfs_arc_meta_limit: 3221225472 -> 0
kstat.zfs.darwin.tunable.zfs_arc_min: 1610612736 -> 0
kstat.zfs.darwin.tunable.zfs_arc_meta_min: 1342177280 -> 0
kstat.zfs.darwin.tunable.zfs_dirty_data_max: 536870912 -> 1717986918
exported the pool, and applied pressure, but the SPL is still holding >3GB
Time read miss miss% dmis dm% pmis pm% mmis mm% size tsize
01:54:00 0 0 0 0 0 0 0 0 0 144K 1610M
sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 3529768960
System info
% sysctl zfs spl
zfs.kext_version: 1.9.4-0
spl.kext_version: 1.9.4-0
% sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.6
BuildVersion: 18G7016
Next steps?