openzfsonosx / zfs

OpenZFS on OS X

Home Page:https://openzfsonosx.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kernel Panic - OpenZFS 2.1.0 on Apple Silicon

jawbroken opened this issue · comments

M1 Ultra Mac Studio (macOS Monterey 12.3.1) running OpenZFS 2.1.0 Release. Apologies for no symbols in the backtrace, I'm not sure how to do that on the new hardware yet.

panic(cpu 4 caller 0xfffffe001d1da1b0): stack_alloc: kernel_memory_allocate(size: 0xc000, mask: 0x7fff, flags: 0x1134) failed with 3 @stack.c:184
Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 21E258
Kernel version: Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000
Fileset Kernelcache UUID: 0631AF68D2B8D6FEA30E36D7895D4DB4
Kernel UUID: C342869F-FFB9-3CCE-A5A3-EA711C1E87F6
iBoot version: iBoot-7459.101.3
secure boot?: YES
Paniclog version: 13
KernelCache slide: 0x00000000158ac000
KernelCache base:  0xfffffe001c8b0000
Kernel slide:      0x000000001605c000
Kernel text base:  0xfffffe001d060000
Kernel text exec slide: 0x0000000016144000
Kernel text exec base:  0xfffffe001d148000
mach_absolute_time: 0x4c978bb4608
Epoch Time:        sec       usec
  Boot    : 0x6248a9e4 0x00097580
  Sleep   : 0x00000000 0x00000000
  Wake    : 0x00000000 0x00000000
  Calendar: 0x624c0289 0x00047351

Zone info:
  Foreign : 0xfffffe0024ba0000 - 0xfffffe0024bb0000
  Native  : 0xfffffe100069c000 - 0xfffffe300069c000
  Readonly: 0xfffffe14cd368000 - 0xfffffe1666d00000
  Metadata: 0xfffffe8bf8968000 - 0xfffffe8c048d8000
  Bitmaps : 0xfffffe8c048d8000 - 0xfffffe8c248d8000

CORE 0 PVH locks held: None
CORE 1 PVH locks held: None
CORE 2 PVH locks held: None
CORE 3 PVH locks held: None
CORE 4 PVH locks held: None
CORE 5 PVH locks held: None
CORE 6 PVH locks held: None
CORE 7 PVH locks held: None
CORE 8 PVH locks held: None
CORE 9 PVH locks held: None
CORE 10 PVH locks held: None
CORE 11 PVH locks held: None
CORE 12 PVH locks held: None
CORE 13 PVH locks held: None
CORE 14 PVH locks held: None
CORE 15 PVH locks held: None
CORE 16 PVH locks held: None
CORE 17 PVH locks held: None
CORE 18 PVH locks held: None
CORE 19 PVH locks held: None
CORE 0: PC=0xfffffe001d2dae10, LR=0xfffffe001d2dae0c, FP=0xfffffe6193b03e90
CORE 1: PC=0xfffffe001d1d021c, LR=0xfffffe001d1d021c, FP=0xfffffe817a8738a0
CORE 2: PC=0xfffffe001d25e210, LR=0xfffffe001d25e078, FP=0xfffffe60d4571d50
CORE 3: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe60f4ff3f00
CORE 4 is the one that panicked. Check the full backtrace for details.
CORE 5: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe619f873f00
CORE 6: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe6193a33f00
CORE 7: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe60d4b63f00
CORE 8: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe817b0a3f00
CORE 9: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe619e6c3f00
CORE 10: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe817a823f00
CORE 11: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe817a813f00
CORE 12: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe6065f0bf00
CORE 13: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe6191aebf00
CORE 14: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe817afa3f00
CORE 15: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe6193be3f00
CORE 16: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe817a863f00
CORE 17: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe817ae03f00
CORE 18: PC=0xfffffe001d1d7418, LR=0xfffffe001d1d7418, FP=0xfffffe619ece3f00
CORE 19: PC=0xfffffe001d1d741c, LR=0xfffffe001d1d7418, FP=0xfffffe817b0b3f00
Compressor Info: 0% of compressed pages limit (OK) and 0% of segments limit (OK) with 0 swapfiles and OK swap space
Panicked task 0xfffffe24cd21c678: 0 pages, 1359 threads: pid 0: kernel_task
Panicked thread: 0xfffffe1b339168c8, backtrace: 0xfffffe60d4c83770, tid: 105
        lr: 0xfffffe001d1a1560  fp: 0xfffffe60d4c837e0
        lr: 0xfffffe001d1a1228  fp: 0xfffffe60d4c83850
        lr: 0xfffffe001d2e5ecc  fp: 0xfffffe60d4c83870
        lr: 0xfffffe001d2d805c  fp: 0xfffffe60d4c838e0
        lr: 0xfffffe001d2d5a98  fp: 0xfffffe60d4c839a0
        lr: 0xfffffe001d14f7f8  fp: 0xfffffe60d4c839b0
        lr: 0xfffffe001d1a0eac  fp: 0xfffffe60d4c83d50
        lr: 0xfffffe001d1a0eac  fp: 0xfffffe60d4c83dc0
        lr: 0xfffffe001d9caacc  fp: 0xfffffe60d4c83de0
        lr: 0xfffffe001d1da1b0  fp: 0xfffffe60d4c83e60
        lr: 0xfffffe001d1f09d4  fp: 0xfffffe60d4c83e90
        lr: 0xfffffe001d1bf968  fp: 0xfffffe60d4c83f00
        lr: 0xfffffe001d1bf898  fp: 0xfffffe60d4c83f20
        lr: 0xfffffe001d158e78  fp: 0x0000000000000000

Edit: Corrected allocation size in kernel_memory_allocate.

Tried to symbolicate this manually for the last hour or so but I can't get "image lookup -a" to print anything and all the online instructions I found are outdated.

It's certainly plausible that we are crashing through the kernel call stack guard page, and that it's more likely on an arm than on an x86_64. I don't run zfs on an M1 yet (your efforts here make that more likely and I wager I would be more likely than you to run into all sorts of interesting panics on ARM :) ), and I don't think lundman does either. I also, on my most stressed-out box, run with a 32 kB stack.

On arm, the xnu KERNEL_STACK_SIZE macro will be 1, 2, or 4 times 16k in the normal case, when the kernel is DEBUG, and when kernel address sanitization is on, respectively, and there is the helpful comment in osfmk/mach/arm/vm_param.h to try doubling that when compiler optimization is off.

kernel_stack_pages is by default KERNEL_STACK_SIZE divided by the page size, which is 16k on ARM, and therefore 1 in the normal case.

We can change this at boot time (nvram boot-args="keepsyms=1 kernel_stack_pages=2"), and see the result reflected in sysctl kern.stack_size (it would go from 16384 to 32768 in this case). Running with two stack pages instead of one is almost certainly fine on anything but a very-busy-with-many-many-threads M1 with the smallest possible amount of RAM.

In the general case, on arm, threads will have an upper and a lower guard page already, and the usual single stack page. On x86_64, there's an upper and lower guard page as well, but pagesize is only 4k; there's also 4 stack pages in the usual case.

Assuming your Ultra has more than the absolute minimum of RAM, bumping up the kernel_stack_pages is likely to stop these panics at almost unnoticeable cost.

You can also adjust (at run time, dynamically) sysctl kstat.spl.misc.spl_misc.split_stack_below. By default it is 8192, and that might not be the best choice on ARMs. You could make it bigger (to say 12k) which makes it more likely that known deep stack descents are pushed into separate threads more aggressively. Schematically, rather than going a() -> b() -> c() -> d() ... -> z(), we'd be doing a() -> b() -> c() -> spawn new thread calling d and wait for it to end, new thread: d() -> e() -> f() ... -> spawn new thread calling g() and wait for it to end ...

If either or both of these approaches make your panics go away, we can look at further reducing stack frame size on arm builds, and sprinkling stack switching into a few more places.

Thanks for you help. I have 128 GB of RAM so I'll find some time to experiment with the boot-args and see if I can get that working. There are conflicting reports about whether SIP needs to be disabled, etc. and I didn't have much luck with keepsyms when I tried it, but I didn't spend a lot of time on it yet. Unfortunately it seems to take several days to reproduce and is fairly unpredictable, so it's not going to be a long process to try to work out what is helping.

I noticed the allocation size (0xc000) was missing from the panic above somehow, so I fixed that. I don't really know what I'm talking about, but does the panic happening here suggest that it's failing when trying to allocate kernel memory for a new thread rather than on stack overflow?

The return value from kernel_memory_allocate indicates

#define KERN_NO_SPACE			3
		/* The address range specified is already in use, or
		 * no address range of the size specified could be
		 * found.
		 */

Bah, you're right, I was too tired to look properly at the site of the panic.

Keep an eye on kstat.spl.misc.spl_misc.active_threads to see if it becomes outrageously large. "top -u -o th" is possibly also valuable (#th column) or the equivalent in Activity Monitor or the like.

A thousand and change kernel threads is not worrisome, nor is 3000-5000 total system threads. Much more than that would be something to look into, however.

When I'm more awake, we can see if /usr/bin/vmmap kernel and /usr/bin/zprint can tell us something about a run on the relevant memory zones.

kstat.spl.misc.spl_misc.active_threads is 514 currently, and whenever I've looked at the number of threads for kernel_task in top/Activity Monitor it's been pretty stable at ~1,360, but I'll start logging the former to a file every minute or so and see if anything exciting happens over time.

I've been monitoring active_threads as above and it's almost always 514, and changes to 525 briefly sometimes under load. I was seeing some Spotlight-related crashes in my logs so I disabled that for one of my two pools where it had somehow re-enabled itself. It hasn't kernel panicked since then (the one I posted above). I had some similar issues on my previous Mac Mini fileserver running 1.9.x so I had disabled Spotlight where possible.

With Spotlight disabled (it was preciously accidentally enabled for one of my two pools), I now haven't had a crash in over a month. I suspect there's some issue with the interaction between OpenZFS and Spotlight (since I had similar problems previously with an Intel Mac Mini), but I'm closing this because I can't be sure where the issue lies and am no longer reproducing the issue. I never saw an active_threads over 527 and it was almost always stable at 514.

We can change this at boot time (nvram boot-args="keepsyms=1 kernel_stack_pages=2"), and see the result reflected in sysctl kern.stack_size (it would go from 16384 to 32768 in this case). Running with two stack pages instead of one is almost certainly fine on anything but a very-busy-with-many-many-threads M1 with the smallest possible amount of RAM.

Setting the kernel_stack_pages flag in nvram indeed seems to do the trick! After adding the boot args flag I tried reenabling spotlight indexing + doing all the IO heavy operations at the same time (Compiling on Xcode + DaisyDisk scanning disks + spinning up multiple docker services) and all seems to work well without freezing up my system.

P.S. I'm running a Mac Studio M1 Max 32GB.

OK, so doubling stack_pages work? then we are overflowing somewhere. Do we know which binary that stack is from at the top of this issue?