openzfsonosx / zfs

OpenZFS on OS X

Home Page:https://openzfsonosx.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vmem “death file”

rottegift opened this issue · comments

Repeatable fun, requires “inside” virtual machine guest and “outside” virtual machine host running vmware fusion.

  1. “Outside”: boot from an apfs startup volume, install master built w —enable-debug, including installing the kexts. Reboot, make sure that DEBUG kexts are loaded and running. Do things with zfs to build up a nice juicy ARC.

  2. “Inside”: build install run master w —enable-debug. Do things “Inside” to build up a nice healthy ARC.

  3. Suspend VM. (Maybe also reboot “Outside”, although that seems paranoid.)

  4. In “Outside”, copy (rsync, dd, gdd, cp, cat, …) the suspended vmem file corresponding with “inside” to a zvol on “outside”. (I have so far only tested with volblocksize=128k compression=lz4 checksum=edonr — and I would not be entirely surprised if small volblocksizes don’t show this) Probably the relevant thing is master —enable-debug, so scp is also likely to cause problems.

  5. Total hang. No panic. No lldb. No NMI. “Outside” needs a power cycle.

Hypothesis: one of the guard patterns in the “inside” kernel space in the vmem memory image totally confuses the “outside” DEBUG kext(s) (is it kmem? is it elsewhere? maybe the whole memory arena in which vmem plays is full of toxins) .

I have a couple ~ 8GiB (but fairly sparse) vmem images that can reproduce this just fine. The failure comes near the very end of copying, so we should be able to figure out what’s going on.

Inside the zvol can be HFS or APFS, doesn’t make a difference.

I’ll try copying to a dataset in a moment too.

Fatality!

cla:~ root# ls -l /Volumes/vm-mojave-860evo/macOS\ 10.14.vmwarevm/macOS\ 10.14-0dd2f64e.vmem
-rw-------  1 smd  staff  8589934592 27 Jun 15:28 /Volumes/vm-mojave-860evo/macOS 10.14.vmwarevm/macOS 10.14-0dd2f64e.vmem
cla:~ root# gdd status=progress obs=4k if=/Volumes/vm-mojave-860evo/macOS\ 10.14.vmwarevm/macOS\ 10.14-0dd2f64e.vmem of=/Volumes/ssdpool/zzdeath/macOS\ 10.14-0dd2f64e.vmem
5657595904 bytes (5.7 GB, 5.3 GiB) copied, 21 s, 269 MB/s
[HANG]

One more thing to try ‘cause I’m bored: changing recordsize down from 128k to 4k, changing compression from lz4 to fletcher4, and changing compression down to “on”:

Ok, doesn’t matter about dataset properties. Sure kill. Gut says it’s abd debugging, or some weird corner case that leads to a hard lockup in the VERIFY macro. But I’m bored of all this rebooting and will come back to it know I know how to reproduce easily enough.

I can probably chop the vmem file down a lot, it’s really only the tail that produces the fatality, and the memory should compress pretty well anyway. However even better if someone else (hi @lundman!) can reproduce and play too.