Expunges happen without hitting limits
boppy opened this issue · comments
Facts first:
- APCu
5.1.23
- PHP fpm
8.2.18
(inside a docker container based onphp:8.2-fpm
) - Memory-Type
mmap
I observe that the cache expunges way too often without any hint why it's doing so.
Does anyone have an idea what's going on and how I can mitigate?
I implemented logging by calling the following script every 30s and writing its results to a log file:
<?php
$ci = apcu_cache_info(true);
$si = apcu_sma_info(true);
$info = [
'now' => date('Y-m-d H:i:s'),
'num_slots' => $ci['num_slots'],
'num_hits' => $ci['num_hits'],
'num_misses' => $ci['num_misses'],
'num_inserts' => $ci['num_inserts'],
'num_entries' => $ci['num_entries'],
'expunges' => $ci['expunges'],
'start_time' => date('c', $ci['start_time']),
'mem_size_MB' => $ci['mem_size'] / 1024 / 1024,
'avail_mem_MB' => $si['avail_mem'] / 1024 / 1024,
'max_mem_MB' => ($si['num_seg'] * $si['seg_size']) / 1024 / 1024
];
echo implode("\t", $info);
echo PHP_EOL;
I then analyze the output with:
awk '$8 != acht { print lali; print; print "" } { acht=$8; lali=$0 }' apcu*log
My output is:
Day Time | Slots | Hit | Miss | Insert | Entries | KILLS | Start Time | Memory Size | Memory Avail | Max Memory |
---|---|---|---|---|---|---|---|---|---|---|
2024-05-12 00:46:17 | 163841 | 6353095 | 1 | 3997021 | 115772 | 1 | 2024-05-11T16:58:57+02:00 | 995.14669799805 | 1000.0436172485 | 1999.9998855591 |
2024-05-12 00:46:47 | 163841 | 4936 | 2 | 3738 | 429 | 2 | 2024-05-12T00:46:20+02:00 | 15.228881835938 | 1983.5077209473 | 1999.9998855591 |
2024-05-12 07:40:01 | 163841 | 5641487 | 2 | 3533330 | 110189 | 2 | 2024-05-12T00:46:20+02:00 | 995.19388580322 | 1000.1700973511 | 1999.9998855591 |
2024-05-12 07:40:31 | 163841 | 7425 | 1 | 6298 | 919 | 3 | 2024-05-12T07:40:06+02:00 | 21.448593139648 | 1969.1580581665 | 1999.9998855591 |
2024-05-12 13:18:07 | 163841 | 5769070 | 1 | 3800976 | 109887 | 3 | 2024-05-12T07:40:06+02:00 | 995.13430023193 | 1000.2423477173 | 1999.9998855591 |
2024-05-12 13:18:37 | 163841 | 9731 | 0 | 8950 | 1105 | 4 | 2024-05-12T13:18:12+02:00 | 25.843650817871 | 1963.471206665 | 1999.9998855591 |
2024-05-12 19:01:42 | 163841 | 5365891 | 0 | 3428149 | 112283 | 4 | 2024-05-12T13:18:12+02:00 | 994.80582427979 | 1000.4887771606 | 1999.9998855591 |
2024-05-12 19:02:12 | 163841 | 1818 | 0 | 1948 | 405 | 5 | 2024-05-12T19:02:03+02:00 | 16.132225036621 | 1982.6050491333 | 1999.9998855591 |
2024-05-13 00:32:43 | 163841 | 4912471 | 0 | 3072161 | 109755 | 5 | 2024-05-12T19:02:03+02:00 | 995.37220001221 | 1000.0014648438 | 1999.9998855591 |
2024-05-13 00:33:13 | 163841 | 12136 | 1 | 10872 | 1528 | 6 | 2024-05-13T00:32:43+02:00 | 33.372436523438 | 1955.4794082642 | 1999.9998855591 |
So basically it's always crashing if available space reaches 1000M.
I don't get why it's doing so. You see that my "Slots Hint" is 160k
(shown as 163.841 in col 2 "Slots"), while "only" handling around 100-130k entries (col 6 "Entries"). I thought the problem was that I only assigned 1000M to apcu at first. So I upped it to 2000M, but the expunges still happen often.
Any hints are highly appreciated!
Full Config
[APCu]
apc.enabled = 1
apc.enable_cli = 1
apc.shm_size = 2000M
apc.shm_segments = 1
apc.shm_strings_buffer = 64M
apc.gc_ttl = 30
apc.entries_hint = 160000
Additional findings:
Lines 755 to 768 in 1ba5a2d
If the smart
-Flag is not set (since it's not mentioned on the config page, I never set it to begin with - see #504), the code checks if available is HALF the full cache size, and purges if not. - Since I'm not that deep into the code, I assume I miss something here, because as it's a switched thing, there for sure is some sense I'm just not getting ;)
Update I: ~2 days later:
After setting the smart
flag to 1
, the purge does not happen anymore. but after reaching approx. 75% of usage, the FPM processes do not respond anymore. I assume it's because APC cannot find a big enough slot to store needed information. I do not see any hint on that in the log files, just a "max_children reached", because it piles up.
Update II: ~3 days later:
The problem seems to boil down to the massive fragmentation my implementation caused. After resetting smart
to 0
again and upping the segment size to 3000M, I'm not running into purges or crashes anymore, because we still have a bunch of storage free at the end of the segment. I'm currently at > 54% usage without purge with 98% fragmentation. After another update that will run this night, the fragmentation should lower further, because I'm not inserting data multiple times per sec anymore.
Nevertheless I think that at least the hard crashes are something to look after. Also the docs could state, how and why the purge takes place.