openzfsonosx / zfs

OpenZFS on OS X

Home Page:https://openzfsonosx.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

macOS 10.15 Catalina support ✅

pirate opened this issue · comments

It's still early, so I don't expect there to already be support for the new macOS Catalina beta, but surprisingly it worked! I figured I'd open a ticket to help track progress on any bugs. (Also to serve as a resource for people like me who Googled "zfs" "macOS" OR "osx" "catalina" OR "10.15" and got no real results.)

  • macOS 10.15 Beta (19A487)
  • openzfsonosx 1.9.0-1
  • installer used OpenZFS_on_OS_X_1.9.0.dmg/OpenZFS on OS X 1.9.0 Mojave.pkg

After downloading it from the homepage, I ran the Mojave installer on my system and it failed the first time with a yellow warning at the end of the last page in Install.app.
However, after immediately trying a second time it seems to have succeeded and be working perfectly now.

➜  zpool --version
zfs-1.9.0-1
zfs-kmod-1.9.0-1
➜  sudo gdd if=/dev/zero of=~/Desktop/test.zpool bs=1M count=128
➜  sudo zpool create test ~/Desktop/test.zpool
➜  sudo zfs mount -a
➜  sudo zpool status
➜  sudo echo "test" > /Volumes/test.txt && sync && cat /Volumes/test.txt
# everything works as expected for raw file vdevs
➜  sudo zpool create -f -o ashift=12 \
            -O casesensitivity=insensitive \
            -O normalization=formD \
            -O compression=lz4 \
            -O utf8only=on \
            -O sync=disabled \
            test2 mirror disk6 disk7
# everything also works as expected for two mirrors on Samsung FIT 32GB USB key vdevs
# files read and write correctly, and the pool still work after disconnecting and reconnecting the usb keys

The one minor thing that could be fixed is to enable installing via homebrew cask (once more people confirm it's stable):

➜  brew cask install openzfs
==> Caveats
To install and/or use openzfs you may need to enable its kernel extension in:
  System Preferences → Security & Privacy → General
For more information refer to vendor documentation or this Apple Technical Note:
  https://developer.apple.com/library/content/technotes/tn2459/_index.html

==> Satisfying dependencies
Error: Cask openzfs depends on macOS release being one of [10.9, 10.10, 10.11, 10.12, 10.13, 10.14], but you are running release 10.15.

I attempted to check if Catalina worked last week, but found that VMWare Fusion does not work with it yet. I've been waiting for a fix for fusion :)

commented

The only issue I have found with Catalina and 1.9.1 rc1 is that ZFS pools no longer auto mount on login. I have to run sudo zpool import xxx manually. I think it's to do with allowing access to removable volumes but I don't know how to fix that!

commented

@dgsga Hmm, or the permissions to use launchDaemons for this kind of stuff - don't know if the zpool-import-all script actually still gets run or not

I encountered anther bug on Catalina. When under high IO (I think), Catalina will crash (without showing the kernel panic screen) with half a second of loud fan noise.

I encountered this a lot in Beta 1 and therefore revert to 10.14, I have not saved the crash report as I thought it was a Catalina problem and will be address by apple. ((I think it was a segfault, but not sure if I remembered it correctly))

However, today when I tried out 10.15b5, it happened exactly once but significantly less often than before. Unfortunately this time I don't get a crash report but I will try my best to reproduce it and upload the report once I success.

@JMoVS I'm using 1.9.2 and all the pools are imported automatically (but the volume is internal ssd instead of external)

commented

@michael-yuji I've had exactly the same problem with a kernel panic when under high IO such as Spotlight, Photos.app or Sync.app indexing. The same thing occasionally happened in Mojave, where spl.kext rather than zfs.kext was highlighted in the kp report.

One of my laptop is using 1.8.1 with Catalina and it panics, this time luckily I got a crash report:

panic(cpu 2 caller 0xffffff800a065b5a): Kernel trap at 0xffffff7f8c2fd7e5, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x00002007210001e0, CR3: 0x000000000e3f1000, CR4: 0x00000000003626e0
RAX: 0x0000000000000010, RBX: 0xffffff9210bcdfb0, RCX: 0xffffff9210bcdfc0, RDX: 0xffffff81fabbbe18
RSP: 0xffffff81fabbbdc0, RBP: 0xffffff81fabbbdc0, RSI: 0xffffff9210bcdfb0, RDI: 0xffffff9208818fb0
R8:  0x0000200721000000, R9:  0x0000000000000002, R10: 0x0000000000000001, R11: 0xffffff9211046fc0
R12: 0xffffff91fbf391b8, R13: 0xffffff9210bcdfc0, R14: 0x0000000000000010, R15: 0xffffff9208818fb0
RFL: 0x0000000000010282, RIP: 0xffffff7f8c2fd7e5, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x00002007210001e0, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff81fabbb820 : 0xffffff8009f3cb9b 
0xffffff81fabbb870 : 0xffffff800a073d45 
0xffffff81fabbb8b0 : 0xffffff800a0657ab 
0xffffff81fabbb900 : 0xffffff8009ee3bb0 
0xffffff81fabbb920 : 0xffffff8009f3c287 
0xffffff81fabbba20 : 0xffffff8009f3c66b 
0xffffff81fabbba70 : 0xffffff800a6ccc69 
0xffffff81fabbbae0 : 0xffffff800a065b5a 
0xffffff81fabbbc60 : 0xffffff800a06585c 
0xffffff81fabbbcb0 : 0xffffff8009ee3bb0 
0xffffff81fabbbcd0 : 0xffffff7f8c2fd7e5 
0xffffff81fabbbdc0 : 0xffffff7f8c2f90d1 
0xffffff81fabbbe00 : 0xffffff7f8c2f93ec 
0xffffff81fabbbe30 : 0xffffff7f8c2fb9ad 
0xffffff81fabbbe70 : 0xffffff7f8c301554 
0xffffff81fabbbeb0 : 0xffffff7f8c2fc8b8 
0xffffff81fabbbee0 : 0xffffff7f8c300a5c 
0xffffff81fabbbf10 : 0xffffff7f8c305f16 
0xffffff81fabbbfa0 : 0xffffff8009ee313e 
      Kernel Extensions in backtrace:
         net.lundman.spl(1.8.1)[F931881B-FB27-3712-8C57-4DF33E9CCD48]@0xffffff7f8c2f8000->0xffffff7f8d4ecfff

BSD process name corresponding to current thread: kernel_task

Mac OS version:
19A536g

Kernel version:
Darwin Kernel Version 19.0.0: Fri Aug  9 21:59:46 PDT 2019; root:xnu-6153.0.139.161.2~2/RELEASE_X86_64
Kernel UUID: E2D7BDCF-3936-31FC-B884-D01BB1F44587
Kernel slide:     0x0000000009c00000
Kernel text base: 0xffffff8009e00000
__HIB  text base: 0xffffff8009d00000
System model name: MacBookPro13,3 (Mac-A5C67F76ED83108C)
System shutdown begun: NO
Panic diags file available: YES (0x0)

System uptime in nanoseconds: 5872071293263
last loaded kext at 3593327050318: >usb.cdc.acm	5.0.0 (addr 0xffffff7f8e9be000, size 32768)
last unloaded kext at 4310866791230: >!UMergeNub	900.4.2 (addr 0xffffff7f8e6cf000, size 12288)
loaded kexts:
com.intel.kext.intelhaxm	7.3.2
net.lundman.zfs	1.8.1
net.lundman.spl	1.8.1
@kext.AMDFramebuffer	3.0.0
@kext.AMDRadeonX4000	3.0.0
>AudioAUUC	1.70
@kext.AMDRadeonServiceManager	3.0.0
>!AGraphicsDevicePolicy	4.1.30
@fileutil	20.036.15
@filesystems.autofs	3.0
@AGDCPluginDisplayMetrics	4.1.30
>!AHV	1
|IOUserEthernet	1.0.1
|IO!BSerialManager	7.0.0d105
>!AUpstreamUserClient	3.6.8
>pmtelemetry	1
>AGPM	111.1.18
>X86PlatformShim	1.0.0
>!APlatformEnabler	2.7.0d0
>!A!ISKLGraphics	14.0.0
@Dont_Steal_Mac_OS_X	7.0.0
>AGDCBacklightControl	4.1.30
>!AHDA	283.13
@kext.AMD9500!C	3.0.0
>!AThunderboltIP	3.1.2
>!AHIDALSService	1
>eficheck	1
>!AMuxControl	4.1.30
>SMCMotionSensor	3.0.4d1
>!A!IPCHPMC	2.0.1
>!AGFXHDA	100.1.421
>!AEmbeddedOSSupportHost	1
>AirPort.BrcmNIC	1400.1.1
>!A!ISKLGraphicsFramebuffer	14.0.0
>!A!ISlowAdaptiveClocking	4.0.0
>!AMCCSControl	1.10
>!AVirtIO	1.0
@filesystems.hfs.kext	522.0.5
@!AFSCompression.!AFSCompressionTypeDataless	1.0.0d1
@BootCache	40
@!AFSCompression.!AFSCompressionTypeZlib	1.0.0
>!ATopCaseHIDEventDriver	153
@filesystems.apfs	1412.0.16
@private.KextAudit	1.0
>!ASmartBatteryManager	161.0.0
>!AACPIButtons	6.1
>!ARTC	2.0
>!ASMBIOS	2.1
>!AACPIEC	6.1
>!AAPIC	1.7
$!AImage4	1
@nke.applicationfirewall	302
$TMSafetyNet	8
@!ASystemPolicy	2.0.0
|EndpointSecurity	1
@kext.AMDRadeonX4100HWLibs	1.0
@kext.AMDRadeonX4000HWServices	3.0.0
@kext.triggers	1.0
|IOAVB!F	800.16
>!ASSE	1.0
>DspFuncLib	283.13
@kext.OSvKernDSPLib	529
@!AGPUWrangler	4.1.30
>!ABacklightExpert	1.1.0
>!AHDA!C	283.13
|IOHDA!F	283.13
@kext.AMDSupport	3.0.0
>!AGraphicsControl	4.1.30
|IOAudio!F	300.2
@vecLib.kext	1.2.0
|IONDRVSupport	558
|IO!BHost!CUARTTransport	7.0.0d105
|IO!BHost!CTransport	7.0.0d105
>!A!ILpssUARTv1	3.0.60
>!A!ILpssUARTCommon	3.0.60
>!AOnboardSerial	1.0
|IO80211!F	1200.12.2b1
>mDNSOffloadUserClient	1.0.1b8
>corecapture	1.0.4
@!AGraphicsDeviceControl	4.1.30
|IOAccelerator!F2	438.1.17
|IOSlowAdaptiveClocking!F	1.0.0
>!ASMBus!C	1.0.18d1
|IOGraphics!F	558
>X86PlatformPlugin	1.0.0
>IOPlatformPlugin!F	6.0.0d8
@plugin.IOgPTPPlugin	800.14
|IOEthernetAVB!C	1.1.0
|IOSkywalk!F	1
>usb.cdc.ncm	5.0.0
>usb.!UiBridge	1.0
>usb.cdc	5.0.0
>usb.networking	5.0.0
>usb.!UHostCompositeDevice	1.2
|IOSerial!F	11
|IOSurface	269.6
@filesystems.hfs.encodings.kext	1
>!AActuatorDriver	3400.32
>!AHIDKeyboard	209
>!AHS!BDriver	153
>IO!BHIDDriver	7.0.0d105
|IO!B!F	7.0.0d105
|IO!BPacketLogger	7.0.0d105
>!AMultitouchDriver	3400.32
>!AInputDeviceSupport	3400.25
>!AHSSPIHIDDriver	58
>!AThunderboltDPInAdapter	6.1.9
>!AThunderboltDPAdapter!F	6.1.9
>!AThunderboltPCIDownAdapter	2.5.2
>!AHSSPISupport	58
>!A!ILpssSpi!C	3.0.60
|IONVMe!F	2.1.0
>!AThunderboltNHI	5.5.8
>!AHPM	3.4.4
|IOThunderbolt!F	7.4.5
>!A!ILpssI2C!C	3.0.60
>!A!ILpssDmac	3.0.60
>!A!ILpssI2C	3.0.60
>!A!ILpssGspi	3.0.60
>usb.!UXHCIPCI	1.2
>usb.!UXHCI	1.2
>usb.!UHostPacketFilter	1.0
|IOUSB!F	900.4.2
>!AEFINVRAM	2.1
>!AEFIRuntime	2.1
|IOSMBus!F	1.1
|IOHID!F	2.0.0
$quarantine	4
$sandbox	300.0
@kext.!AMatch	1.0.0d1
>DiskImages	493.0.0
>!AFDEKeyStore	28.30
>!AEffaceable!S	1.0
>!AKeyStore	2
>!UTDM	489.0.2
|IOSCSIBlockCommandsDevice	422.0.1
>!ACredentialManager	1.0
>KernelRelayHost	1
>!ASEPManager	1.0.1
>IOSlaveProcessor	1
|IOTimeSync!F	800.14
|IONetworking!F	3.4
|IOUSBMass!SDriver	157.0.1
|IOSCSIArchitectureModel!F	422.0.1
|IO!S!F	2.1
|IOUSBHost!F	1.2
>!UHostMergeProperties	1.2
>usb.!UCommon	1.0
>!ABusPower!C	1.0
|CoreAnalytics!F	1
>!AMobileFileIntegrity	1.0.5
@kext.CoreTrust	1
|IOReport!F	47
>!AACPIPlatform	6.1
>!ASMC	3.1.9
>watchdog	1
|IOPCI!F	2.9
|IOACPI!F	1.4
@kec.pthread	1
@kec.Libm	1
@kec.corecrypto	1.0

commented

zfs and spl 1.8.1 are really quite old by now, can you try upgrading to 1.9.2 and let us know if it happens there as well?

Also, are you familiar with boot-args? keepsyms=1 would be helpful

zfs and spl 1.8.1 are really quite old by now, can you try upgrading to 1.9.2 and let us know if it happens there as well?

Also, are you familiar with boot-args? keepsyms=1 would be helpful

Sure, it was an accident when I boot from this laptop and use it for a while and crashed (which I am very happy about it cuz I finally got a crash report). I am going to upgrade it and use it until it panic again lol.

Heads up: Apple's being problematic and telling some of us to update to Catalina (even on unsupported Macs (MacPro5,1 & 4,1) for some bizarre reason) on certain bugs in Mojave. I'm kinda baffled and have attempted to have conversations with Apple dev staff via Bug Report/Feedback Ass., but not much luck. Essentially I've reported "blah is happening in 10.14.6" and their reply "Please try beta X of 10.15" and let us know if the problem is resolved. I'm pretty disturbed and upset by this behavior by Apple, but I've heard of others hitting the same issue now too as I search the web.

I'm working on moving to Catalina here myself at the moment via the "unsupported methods" to see if my problems are indeed resolved as Apple has instructed, but it's a headache and some issues such as ZFS trouble has me worried.

this is still a problem with 1.9.2 an Catalina beta 7

OK, so in Catalina it appears our zfs.fs is not being used, this means the devdisk mounts will fail - so you are better off having devdisk=off for now.

diskarbitrationd.log:

14:21:27   probed disk, id = /dev/disk3s1, with zfs, ongoing.
14:21:27   probed disk, id = /dev/disk3s1, with zfs, failure.
14:21:27   unable to probe /dev/disk3s1 (status code 0x0000002D).

When trussing we get

  124/0x2b8:  write_nocancel(0x3, "14:21:27   probed disk, id = /dev/disk3s1, with zfs, ongoing.\n\0", 0x3E)		 = 62 0
  124/0x2b8:  open_nocancel(".\0", 0x0, 0x1)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8740, 0x0)		 = 0 0
  124/0x2b8:  fcntl_nocancel(0x4, 0x32, 0x7FFEE2DF8950)		 = 0 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  stat64("/\0", 0x7FFEE2DF86B0, 0x0)		 = 0 0
  124/0x2b8:  stat64("/Library/Filesystems/zfs.fs\0", 0x7FFEE2DF8AC0, 0x0)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F18, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 104 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F38, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 256 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open("/Library/Filesystems/zfs.fs/Contents/Info.plist\0", 0x0, 0x1B6)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8380, 0x0)		 = 0 0
  124/0x2b8:  read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>BuildMachineOSBuild</key>\n\t<string>18A391011</string>\n\t<key>CFBundleDevelopment", 0x10C5)		 = 4293 0
  124/0x2b8:  close(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF73F8, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/en.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 152 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/Base.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/English.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel(".\0", 0x0, 0x1)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8740, 0x0)		 = 0 0
  124/0x2b8:  fcntl_nocancel(0x4, 0x32, 0x7FFEE2DF8950)		 = 0 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  stat64("/\0", 0x7FFEE2DF86B0, 0x0)		 = 0 0
  124/0x2b8:  stat64("/Library/Filesystems/zfs.fs\0", 0x7FFEE2DF8AC0, 0x0)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F18, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 104 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F38, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 256 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open("/Library/Filesystems/zfs.fs/Contents/Info.plist\0", 0x0, 0x1B6)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8380, 0x0)		 = 0 0
  124/0x2b8:  read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>BuildMachineOSBuild</key>\n\t<string>18A391011</string>\n\t<key>CFBundleDevelopment", 0x10C5)		 = 4293 0
  124/0x2b8:  close(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF73F8, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/en.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 152 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/Base.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/English.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  write_nocancel(0x3, "14:21:27   probed disk, id = /dev/disk3s1, with zfs, failure.\n\0", 0x3E)		 = 62 0
  124/0x2b8:  write_nocancel(0x3, "14:21:27 unable to probe /dev/disk3s1 (status code 0x0000002D).\n\0", 0x40)		 = 64 0

The sources for DAProbe.c:

   if ( status )
    {
        /*
         * We have found no probe match for this media object.
         */

        if ( context->filesystem )
        {
            CFStringRef kind;

            kind = DAFileSystemGetKind( context->filesystem );

            DALogDebug( "  probed disk, id = %@, with %@, failure.", context->disk, kind );

            if ( status != FSUR_UNRECOGNIZED )
            {
                DALogError( "unable to probe %@ (status code 0x%08X).", context->disk, status );
            }

Which seems to imply we aren't matching (although it picks zfs.fs ok, then reject it?)

As 0x2D is 45, the error is ENOTSUP, which means we are probably running afoul of these tests:

https://github.com/appleopen/DiskArbitration/blob/master/diskarbitrationd/DAFileSystem.c#L645

However, I have tried copying hfsutil's plist and entire hfs.fs/ directory to no avail.

OK, turns out we should have a /Library/Filesystems/zfs.fs/Contents/Resources/fsck_zfs. We do compile one in cmd/fsck_zfs which is more or less just /bin/true. With that in the bundle, everything appears to function as expected.

commented

@lundman Does it hurt to put that t the fsck_zfs also for older versions? Otherwise could you push a commit to master to fix this?

Not at all, should be fixed for all versions yep

OpenZFSonOsX-Catalina-1.9.2.zip

I have done a test build using Xcode 11, and Catalina, which also has the zfs_util fixes for mounts. Please give feedback.

Halfway-Off Topic, but what does concern me (a lot) now is: How's a future of openzfsonosx (post-Catalina) possible, with the deprecation of kexts? How would a volume- and filesystem be even thinkable in userland? Will it wait for photoshop to finish rendering before committing the ZIL?
Is the tremendous and impeccable work done by the openzfs team and by @lundman destined to be trashed by this (sorry, can't find a better word) sort of "fascist" direction Apple is taking in regard of their OS and services..? Maybe more a topic for a forum than for a github bug...
Best Regards,

Lorenzo

commented

I have just compiled the latest commit on Catalina DP8 using the Xcode 11 GM, all is working perfectly here. Thanks Jorgen for all your hard work.

Apple has made developing on osx a little less friendly in recent times, that is true, and there probably will be a day in the future when we can no longer maintain support. But until that time!

Also, as far as anything Apple has said so far, there are specific categories of kernel extensions that Apple is transitioning to DriverKit (USB HID devices, serial devices, NICs), NetworkingDriverKit, and Endpoint Security extensions... and filesystems are not one of those categories. It seems unlikely to me that Apple will completely eliminate the ability to install kernel extensions on macOS.

I can just about guarantee that any panics that have the ZFS kext in it will create a flag with them and they'll more seriously consider it. I wonder if there's a way to build an exception handling mechanism into ZFS that will catch a panic before it goes back to the kernel and send that data over here for processing?...

Also, if that's really a concern, maybe just don't send the panic reports to Apple if you're generating lots of them due to testing/adding new features/etc... I haven't had a ZFS panic in nearly forever running the stable releases with my couple of pools.

I installed Jorgen's test build, but unfortunately that did not solve the problem of frequent panics for me.

Panics happen now more often since installing 15.1 beta 3 (it had been pretty stable since 15.0 beta 5 or so) , possibly related to that Mail decided to re-download all my hundred thousands of emails -- so I'm not sure if the frequent reboots are related to more disk activity or some additional changes in beta 3.

If you are having panics on Catalina, we'd need to have the stack pasted, with keepsyms=1 so we can take a look at it.

here's the stack I saved last time, I'll set keepsyms=1 for next time...

panic(cpu 2 caller 0xffffff801806acaa): Kernel trap at 0xffffff7f9c23027a, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000138, CR3: 0x000000002c527000, CR4: 0x00000000003626e0
RAX: 0x00000000000007a8, RBX: 0xffffff92c43cefb0, RCX: 0x0000000000000000, RDX: 0x0000000003000000
RSP: 0xffffff921891bde0, RBP: 0xffffff921891be10, RSI: 0xffffff922238d120, RDI: 0xffffff922238d190
R8:  0x0000000000000001, R9:  0x0000000000000002, R10: 0x0000000000000001, R11: 0x0000000000000000
R12: 0xffffff92c43ce7c8, R13: 0xffffff922238d190, R14: 0xffffff922238d118, R15: 0xffffff922238d000
RFL: 0x0000000000010202, RIP: 0xffffff7f9c23027a, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x0000000000000138, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff921891b840 : 0xffffff8017f41b6b 
0xffffff921891b890 : 0xffffff8018078e95 
0xffffff921891b8d0 : 0xffffff801806a8fe 
0xffffff921891b920 : 0xffffff8017ee8bb0 
0xffffff921891b940 : 0xffffff8017f41257 
0xffffff921891ba40 : 0xffffff8017f4163b 
0xffffff921891ba90 : 0xffffff80186d2879 
0xffffff921891bb00 : 0xffffff801806acaa 
0xffffff921891bc80 : 0xffffff801806a9a8 
0xffffff921891bcd0 : 0xffffff8017ee8bb0 
0xffffff921891bcf0 : 0xffffff7f9c23027a 
0xffffff921891be10 : 0xffffff7f9c22c1dc 
0xffffff921891be80 : 0xffffff7f9c23141b 
0xffffff921891bec0 : 0xffffff7f9c22c8e6 
0xffffff921891bef0 : 0xffffff7f9c230948 
0xffffff921891bf20 : 0xffffff7f9c235d56 
0xffffff921891bfa0 : 0xffffff8017ee813e 
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[FD34B77F-63E0-3672-9A30-63213502A433]@0xffffff7f9c228000->0xffffff7f9d41dfff

BSD process name corresponding to current thread: kernel_task
Boot args: chunklist-security-epoch=0 -chunklist-no-rev2-dev

Mac OS version:
19A558d

... and here the most recent crash (on a different machine) with keepsyms=1

panic(cpu 2 caller 0xffffff8007e6acaa): Kernel trap at 0xffffff7f8a105380, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000200721000138, CR3: 0x000000000c2b5000, CR4: 0x00000000003626e0
RAX: 0xffffff81f84f3cd8, RBX: 0xffffff81f84f3fb0, RCX: 0x0000200721000000, RDX: 0x0000000003000000
RSP: 0xffffff81f692bdd0, RBP: 0xffffff81f692be00, RSI: 0xffffff81f6951120, RDI: 0xffffff81f6951190
R8:  0x0000000000000051, R9:  0x00000000000001ed, R10: 0x0000000000000001, R11: 0x0000000000000000
R12: 0xffffff81f84f3cd8, R13: 0xffffff81f6951190, R14: 0xffffff81f6951118, R15: 0xffffff81f6951000
RFL: 0x0000000000010286, RIP: 0xffffff7f8a105380, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x0000200721000138, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff81f692b830 : 0xffffff8007d41b6b mach_kernel : _handle_debugger_trap + 0x47b
0xffffff81f692b880 : 0xffffff8007e78e95 mach_kernel : _kdp_i386_trap + 0x155
0xffffff81f692b8c0 : 0xffffff8007e6a8fe mach_kernel : _kernel_trap + 0x4ee
0xffffff81f692b910 : 0xffffff8007ce8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff81f692b930 : 0xffffff8007d41257 mach_kernel : _DebuggerTrapWithState + 0x17
0xffffff81f692ba30 : 0xffffff8007d4163b mach_kernel : _panic_trap_to_debugger + 0x21b
0xffffff81f692ba80 : 0xffffff80084d2879 mach_kernel : _panic + 0x61
0xffffff81f692baf0 : 0xffffff8007e6acaa mach_kernel : _sync_iss_to_iks + 0x2aa
0xffffff81f692bc70 : 0xffffff8007e6a9a8 mach_kernel : _kernel_trap + 0x598
0xffffff81f692bcc0 : 0xffffff8007ce8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff81f692bce0 : 0xffffff7f8a105380 net.lundman.spl : _kmem_findslab + 0x44
0xffffff81f692be00 : 0xffffff7f8a10119b net.lundman.spl : _kmem_error + 0x3b
0xffffff81f692be70 : 0xffffff7f8a106521 net.lundman.spl : _kmem_magazine_destroy + 0xce
0xffffff81f692beb0 : 0xffffff7f8a1018b6 net.lundman.spl : _kmem_depot_ws_reap + 0x6c
0xffffff81f692bee0 : 0xffffff7f8a105a2e net.lundman.spl : _kmem_cache_reap + 0x66
0xffffff81f692bf10 : 0xffffff7f8a10af6b net.lundman.spl : _taskq_thread + 0x1b9
0xffffff81f692bfa0 : 0xffffff8007ce813e mach_kernel : _call_continuation + 0x2e
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f8a0fd000->0xffffff7f8b2f1fff

BSD process name corresponding to current thread: kernel_task
Boot args: -v keepsyms=1

Mac OS version:
19A558d

Kernel version:
Darwin Kernel Version 19.0.0: Sat Aug 31 18:49:12 PDT 2019; root:xnu-6153.11.15~8/RELEASE_X86_64
Kernel UUID: 7878452F-EDBA-3FDA-8430-29920E2E2C99
Kernel slide:     0x0000000007a00000
Kernel text base: 0xffffff8007c00000
__HIB  text base: 0xffffff8007b00000
System model name: MacBookPro13,3 (Mac-A5C67F76ED83108C)
System shutdown begun: NO
Panic diags file available: YES (0x0)

System uptime in nanoseconds: 3051218033384
last loaded kext at 128256200596: com.getdropbox.dropbox.kext	1.10.3 (addr 0xffffff7f8b2f2000, size 49152)
last unloaded kext at 441864637437: >!AXsanScheme	3 (addr 0xffffff7f897fa000, size 40960)
loaded kexts:
com.getdropbox.dropbox.kext	1.10.3
org.pqrs.driver.Karabiner.VirtualHIDDevice.v061000	6.10.0
net.lundman.zfs	1.9.2
net.lundman.spl	1.9.2
@kext.AMDFramebuffer	3.0.0
@kext.AMDRadeonX4000	3.0.0
@kext.AMDRadeonServiceManager	3.0.0
>AudioAUUC	1.70
>!AGraphicsDevicePolicy	4.1.46
@fileutil	20.036.15
@filesystems.autofs	3.0
@AGDCPluginDisplayMetrics	4.1.46
>!AHV	1
|IOUserEthernet	1.0.1
|IO!BSerialManager	7.0.0f4
>!AUpstreamUserClient	3.6.8
>AGPM	111.1.18
>!APlatformEnabler	2.7.0d0
>X86PlatformShim	1.0.0
>pmtelemetry	1
>!A!ISKLGraphics	14.0.0
@Dont_Steal_Mac_OS_X	7.0.0
>AGDCBacklightControl	4.1.46
>!AHDA	283.13
@kext.AMD9500!C	3.0.0
>!AHIDALSService	1
>!AThunderboltIP	3.1.3
>eficheck	1
>!AMuxControl	4.1.46
>SMCMotionSensor	3.0.4d1
>!AGFXHDA	100.1.421
>!A!IPCHPMC	2.0.1
>!AEmbeddedOSSupportHost	1
>AirPort.BrcmNIC	1400.1.1
>!A!ISKLGraphicsFramebuffer	14.0.0
>!A!ISlowAdaptiveClocking	4.0.0
>!AMCCSControl	1.12
>!AVirtIO	1.0
@filesystems.hfs.kext	522.0.9
@!AFSCompression.!AFSCompressionTypeDataless	1.0.0d1
@BootCache	40
@!AFSCompression.!AFSCompressionTypeZlib	1.0.0
>!ATopCaseHIDEventDriver	153
@filesystems.apfs	1412.11.4
@private.KextAudit	1.0
>!ASmartBatteryManager	161.0.0
>!AACPIButtons	6.1
>!ARTC	2.0
>!ASMBIOS	2.1
>!AACPIEC	6.1
>!AAPIC	1.7
$!AImage4	1
@nke.applicationfirewall	302
$TMSafetyNet	8
@!ASystemPolicy	2.0.0
|EndpointSecurity	1
@kext.AMDRadeonX4100HWLibs	1.0
@kext.AMDRadeonX4000HWServices	3.0.0
@kext.triggers	1.0
|IOAVB!F	800.17
>!ASSE	1.0
>DspFuncLib	283.13
@kext.OSvKernDSPLib	529
@!AGPUWrangler	4.1.46
>!ABacklightExpert	1.1.0
>!AHDA!C	283.13
|IOHDA!F	283.13
>X86PlatformPlugin	1.0.0
>!AGraphicsControl	4.1.46
|IOAudio!F	300.2
@vecLib.kext	1.2.0
|IONDRVSupport	558.3
>IOPlatformPlugin!F	6.0.0d8
|IO!BHost!CUARTTransport	7.0.0f4
|IO!BHost!CTransport	7.0.0f4
>!A!ILpssUARTv1	3.0.60
>!A!ILpssUARTCommon	3.0.60
>!AOnboardSerial	1.0
|IO80211!F	1200.12.2b1
>mDNSOffloadUserClient	1.0.1b8
>corecapture	1.0.4
@kext.AMDSupport	3.0.0
@!AGraphicsDeviceControl	4.1.46
|IOAccelerator!F2	438.1.25
|IOSlowAdaptiveClocking!F	1.0.0
>!ASMBus!C	1.0.18d1
|IOGraphics!F	558.3
@plugin.IOgPTPPlugin	800.14
|IOEthernetAVB!C	1.1.0
|IOSkywalk!F	1
>usb.cdc.ncm	5.0.0
>usb.!UiBridge	1.0
>usb.cdc	5.0.0
>usb.networking	5.0.0
>usb.!UHostCompositeDevice	1.2
|IOSerial!F	11
|IOSurface	269.6
@filesystems.hfs.encodings.kext	1
>!AActuatorDriver	3400.34
>!AHIDKeyboard	209
>!AHS!BDriver	153
>IO!BHIDDriver	7.0.0f4
|IO!B!F	7.0.0f4
|IO!BPacketLogger	7.0.0f4
>!AMultitouchDriver	3400.34
>!AInputDeviceSupport	3400.27
>!AHSSPIHIDDriver	58
>!AThunderboltDPInAdapter	6.2.2
>!AThunderboltDPAdapter!F	6.2.2
>!AThunderboltPCIDownAdapter	2.5.2
>!AHSSPISupport	58
>!A!ILpssSpi!C	3.0.60
|IONVMe!F	2.1.0
>!AThunderboltNHI	5.5.8
>!AHPM	3.4.4
|IOThunderbolt!F	7.4.5
>!A!ILpssI2C!C	3.0.60
>!A!ILpssDmac	3.0.60
>!A!ILpssI2C	3.0.60
>!A!ILpssGspi	3.0.60
>usb.!UXHCIPCI	1.2
>usb.!UXHCI	1.2
>usb.!UHostPacketFilter	1.0
|IOUSB!F	900.4.2
>!AEFINVRAM	2.1
>!AEFIRuntime	2.1
|IOSMBus!F	1.1
|IOHID!F	2.0.0
$quarantine	4
$sandbox	300.0
@kext.!AMatch	1.0.0d1
>DiskImages	493.0.0
>!AFDEKeyStore	28.30
>!AEffaceable!S	1.0
>!AKeyStore	2
>!UTDM	489.0.2
|IOSCSIBlockCommandsDevice	422.0.2
>!ACredentialManager	1.0
>KernelRelayHost	1
>!ASEPManager	1.0.1
>IOSlaveProcessor	1
|IOTimeSync!F	800.14
|IONetworking!F	3.4
|IOUSBMass!SDriver	157.11.1
|IOSCSIArchitectureModel!F	422.0.2
|IO!S!F	2.1
|IOUSBHost!F	1.2
>!UHostMergeProperties	1.2
>usb.!UCommon	1.0
>!ABusPower!C	1.0
|CoreAnalytics!F	1
>!AMobileFileIntegrity	1.0.5
@kext.CoreTrust	1
|IOReport!F	47
>!AACPIPlatform	6.1
>!ASMC	3.1.9
>watchdog	1
|IOPCI!F	2.9
|IOACPI!F	1.4
@kec.pthread	1
@kec.Libm	1
@kec.corecrypto	1.0

Also, as far as anything Apple has said so far, there are specific categories of kernel extensions that Apple is transitioning to DriverKit (USB HID devices, serial devices, NICs), NetworkingDriverKit, and Endpoint Security extensions... and filesystems are not one of those categories. It seems unlikely to me that Apple will completely eliminate the ability to install kernel extensions on macOS.

I'd love to be as optimistic. But what if Apple® simply doesn't care about other filesystems than those they support directly? They're tying more and more functionality (see the /Users APFS "Volume(s)") directly to their own filesystem. Even more, they actually want us to interact with the filesystems at a more abstract, "guided" level.
Having "uncontrolled" filesystems just doesn't seem to fit into that logic. Moreover, "we"'re just too few to make a difference. And if you read the articles about the new "Security" measures taken in Catalina lately (and stop ignoring the trends started way before Mojave, first and foremost all the stuff around SIP and which influence even advanced users have on it - not), it cannot go unnoticed that actually the whole Open Source community on the Mac is heavilly affected. It's a political direction that's even superseeding Microsoft® (!) on this matter. I grew up with the Mac, and with OSX as one of my main tools I made my living until now, and as probably many of us, I heavilly contributed to the distribution of macOs among family, friends, collegues, partners.

The Mac used to be the platform for software development lately, be it for mac apps or for anything else (except maybe for .NET). The day they close down on all this - with a loud scream of pain - I'll have to have a new "home" up and running...

Best to All. And Yes, until then, I'll be keeping my reality distortion field clean and colorful, and install, test, and most of all: enjoy each and every new release of openzfsonosx...! :-)

0xffffff81f692bce0 : 0xffffff7f8a105380 net.lundman.spl : _kmem_findslab + 0x44
0xffffff81f692be00 : 0xffffff7f8a10119b net.lundman.spl : _kmem_error + 0x3b
0xffffff81f692be70 : 0xffffff7f8a106521 net.lundman.spl : _kmem_magazine_destroy + 0xce
0xffffff81f692beb0 : 0xffffff7f8a1018b6 net.lundman.spl : _kmem_depot_ws_reap + 0x6c
0xffffff81f692bee0 : 0xffffff7f8a105a2e net.lundman.spl : _kmem_cache_reap + 0x66
0xffffff81f692bf10 : 0xffffff7f8a10af6b net.lundman.spl : _taskq_thread + 0x1b9

Well, that's .. something. So it triggered a reap, and discovered a corrupt memory segment (kmem_error) - at this point it would be very interesting to read the output from kmem_error - but that would require connecting with lldb to the panicked machine from another machine.

@lopezio you sound like my clone. I don't mean to keep hijacking this thread (yeah, I think we need another place to talk about this), but I do want to simply say this is what I believe (and clearly see) and have also been talking about "around the cooler" with folks. I've also heard some Apple engineers who used to work there say the same things and hear the same things from others that still do work there. Mobile and app security for their own cash security is their baby now - not us devs and high end users.

@lundman unfortunately the panic has become rather frequent with 15.1beta3, pretty consistently happening under load (e.g. I keep my mail library in a ZVOL, and having Apple Mail catch up on incoming emails seems to consistently cause the panic...).

It's also happening both on my MacBook Pro and my Mac mini, and the issue goes away when I boot back into MacOS 14, with the same 1.9.2 release.

I'm wondering if not more people are seeing this?

Not many people run 15 yet? Do you have lldb on your any of your machines? (Comes with xcode)

I do have lldb installed but would need instructions...

I have a working beta8 VM finally, so running zfstester for a bit to see if I can trigger any issues locally first.

Ok I appear to be able to trigger something, but there is no KDK for 19A558d so there is nothing I can do in lldb. Quite frustrating.

The stack in Logs is:

sed $'s/\\\\n/\\\n/g' Kernel_2019-09-19-161928_Mac.panic|less

0xffffff887c56bb00 : 0xffffff8008ee8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff887c56bb20 : 0xffffff7f8c989f48 net.lundman.zfs : _nvlist_free + 0x48
0xffffff887c56bc50 : 0xffffff7f8c89a1c0 net.lundman.zfs : _fm_nvlist_destroy + 0x20
0xffffff887c56bc80 : 0xffffff7f8c92ab85 net.lundman.zfs : _zfs_zevent_post_cb + 0x15
0xffffff887c56bca0 : 0xffffff7f8c8999e1 net.lundman.zfs : _zfs_zevent_drain + 0x71
0xffffff887c56bcd0 : 0xffffff7f8c899c20 net.lundman.zfs : _zfs_zevent_post + 0x1d0
0xffffff887c56bd40 : 0xffffff7f8c92c5c2 net.lundman.zfs : _zfs_ereport_zvol_post + 0x122
0xffffff887c56bd90 : 0xffffff7f8c984c91 net.lundman.zfs : _zvolRegisterDevice + 0x261

Hmm even with zvol events taken out, zfstester died in similar path:

0xffffff887dcd3650 : 0xffffff800a4e8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff887dcd3670 : 0xffffff7f8df91928 net.lundman.zfs : _nvlist_free + 0x48
0xffffff887dcd37a0 : 0xffffff7f8de9d120 net.lundman.zfs : _fm_nvlist_destroy + 0x20
0xffffff887dcd37d0 : 0xffffff7f8df305f5 net.lundman.zfs : _zfs_zevent_post_cb + 0x15
0xffffff887dcd37f0 : 0xffffff7f8de9c921 net.lundman.zfs : _zfs_zevent_drain + 0x71
0xffffff887dcd3820 : 0xffffff7f8de9cb60 net.lundman.zfs : _zfs_zevent_post + 0x1d0
0xffffff887dcd3890 : 0xffffff7f8df30685 net.lundman.zfs : _zfs_ereport_post + 0x65
0xffffff887dcd38c0 : 0xffffff7f8ded1de4 net.lundman.zfs : _spa_write_cachefile + 0x344

So does appear to be something aggravating in that area.

If I nerf out the event sending code, zfstester runs to completion. It will be hard to figure out what is wrong in there without KDK though, so waiting for it. If those on Catalina are desperate we can roll out a build without events, but without zed doing its thing, some things are not going to be smooth.

If I take out events, it just happens less frequently. I also tried taking out ZOL assembler, and async-unlinked-drain. But it is pretty unclear what is going on. Interesting to note that it is always the value "0x0000200721000000" written over the memory - either over the nvlist, or over dsl_prop()'s list_t.

Currently going back to earlier versions to see if it a recent bad commit. π

Currently testing branch catalina which appears to be more stable. I can do a .pkg on Monday if it is still ok by then.
It has pool history logging disabled.

OK, that wasn't it - but this appears to be the cause:

37ef7e5

openzfsonosx/spl@e1134e3

The new Catalina-1.9.2.pkg is here:
OpenZFS_on_OS_X_1.9.2_Catalina.zip

If people can test it, so I can announce it proper...

openzfsonosx/spl@e1134e3

The new Catalina-1.9.2.pkg is here:
OpenZFS_on_OS_X_1.9.2_Catalina.zip

If people can test it, so I can announce it proper...

Installed and boots fine, have not tested importing or exporting pools though.

(Is there a macos test suite for this like exists in zfsonlinux?)

We have the same testing environment, but it is many steps in setting up to run, so more for developers at the moment. I ran it multiple times after producing the binary - which is how I know it fixes the biggest problem.

After installing 1.9.2 I can import the pool (after changing tank to /opt/tank). But the imported zfs filesystem ins empty, but it is mounted.

When I reboot the computer I get

zpool status
internal error: Unknown error: -22
[1]    2760 abort      zpool status

That would suggest you have two version of the binaries. Possibly /usr/local/sbin/zpool and /usr/local/bin/zpool. Which can happen if you installed from source, and from Installer, as they (annoyingly) use different defaults.

So we cannot upgrade to catalina yet ? internal error: Unknown error: -22
looks scary

It just means the version of the commandline tools (zpool, zfs) do not match the version of the kernel - so they have no way to communicate. Checking the right tool is run, using the right library, and talking to the right kext version.

# sysctl spl.kext_version zfs.kext_version
spl.kext_version: 1.9.2-2-ge1134e33
zfs.kext_version: 1.9.2-4-g4889d276e0

# zpool version
zfs-1.9.2-4-g4889d276e0
zfs-kmod-1.9.2-4-g4889d276e0

there where multiple binaries installed, but never installed from source. I used the uninstall script to delete the old installation. That worked fine for the binaries, but I can't uninstall the kernel extension.

Its loaded

$ /tmp kextstat |grep lundman.zfs
   76    0 0xffffff7f838f1000 0x2db000   0x2db000   net.lundman.zfs (1.7.4) E794397E-F135-327E-8E93-586078F13B72 <75 26 8 6 5 3 1>
$ /tmp sysctl zfs.kext_version
zfs.kext_version: 1.7.4-1
$ /tmp kextfind -b net.lundman.zfs
$ /tmp # nothing

I looked in /System/Library/Extensions or /Library/Extensions, nothing in there. Don't know where to find the old kernel extension. Also did a whole find /|grep -i zfs as root, and couldn't find the extension.

Where else could it be?

My suggestion to others would be to first uninstall it completely in Mojave, forgot that.

commented

Wow, 1.7.4 is ancient. What you can try is first make sure to export all the pools and then try running kextunload to unload it, then run the installer again to refresh kernel caches (or run

sudo kextcache -invalidate /

See lundman's comment below ;-)

Yes, Catalina has changed how things work. If you had it in Extensions, it is now in the prelinked kernel it uses at boot. (/System/Library/PrelinkedKernels/)

You can just kextcache -invalidate / to rebuild prelink again, to now not contain zfs (assuming it isn't in /L/E/)

Another option is to just unload it by hand kextunload -n net.lundman.zfs (and of course spl - don't worry about the little dependency kext, it can stay). Then install 1.9.2 which at the end of the installer runs the kextcache to rebuild prelinked kernel.

@lundman Thank you for the updates!

Just upgraded to catalina on my hackintosh.

The older zfs still works but the disk doesn't show up in the finder.

After installing the 1.9.2 (https://openzfsonosx.org/w/images/4/44/OpenZFS_on_OS_X_1.9.2_Catalina.pkg) everything works fine!
Also ran some benchmark using the blackmagic disk benchmark tool, no issues found.

yifan-hackintosh-pro:hackintosh yifan$ zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Archive-2019  36.2T  7.81T  28.4T        -         -     0%    21%  1.00x  ONLINE  -
yifan-hackintosh-pro:hackintosh yifan$ zpool status
  pool: Archive-2019
 state: ONLINE
  scan: resilvered 480K in 0 days 00:00:01 with 0 errors on Fri Jun  7 23:01:26 2019
config:

	NAME                                            STATE     READ WRITE CKSUM
	Archive-2019                                    ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    media-7FA5AAC5-DDB2-8A4A-B08C-99E26D0D2004  ONLINE       0     0     0
	    media-39DD8B25-2004-5842-889A-4AF6F01FE565  ONLINE       0     0     0
	    media-FADB8476-8AF8-1E4B-9151-35F8C43CA019  ONLINE       0     0     0
	    media-1C6C5DD0-1AED-4142-8CD7-FF72B45B487F  ONLINE       0     0     0

errors: No known data errors

to bad kextcache -invalidate / didn't help.
After installation 1.9.2 the versions seem fine.
The zpool gets imported but the dirs are empty :-(.

zfs list -o name,mountpoint,mounted,used
zfs list -o name,mountpoint,mounted,used
NAME             MOUNTPOINT            MOUNTED   USED
tank             /opt/tank                 yes  60.1G
tank/docker      /opt/tank/docker          yes  17.2G
# the snapshot dir is there with one snapshot (which has also empty content)
ls -lha /opt/tank/docker/.zfs
dr-xr-xr-x  2 root      wheel     2B Oct  8 10:34 snapshot

After 1.9.2 installation

$ zpool version
zfs-1.9.2-4-g4889d276e0
zfs-kmod-1.9.2-4-g4889d276e0

$ sysctl spl.kext_version zfs.kext_version
spl.kext_version: 1.9.2-2-ge1134e33
zfs.kext_version: 1.9.2-4-g4889d276e0

but after reboot

$ sysctl spl.kext_version zfs.kext_version
spl.kext_version: 1.7.4-1
zfs.kext_version: 1.7.4-1

$ zpool version
zfs-1.9.2-4-g4889d276e0
zfs-kmod-1.7.4-1

thanks for the help

When I try to restore the zfs filesystem from a linux host I get an segmentation fault

ssh root@home zfs send -v "backup/tank/docker@snap1 |  sudo zfs recv tank/docker
[1]    3082 exit 255            ssh root@home zfs send -v  |
       3083 segmentation fault  sudo zfs recv tank/docker

but this could also be because my installation is in a bad state

commented
  1. maybe try exporting the pool
  2. then kextunload
  3. then load the new kexts
  4. then kextcache -system-prelinked-kernel

I can't unload the kext (pool exported)

$ sudo kextunload  -b net.lundman.spl
(kernel) Can't remove kext net.lundman.spl; services failed to terminate - 0xdc008018.
Failed to unload net.lundman.spl - (libkern/kext) kext is in use or retained (cannot unload).

also Failed to unload net.lundman.kernel.dependencies.33 - (libkern/kext) kext is in use or retained (cannot unload).

for zfs.ext it executes without error but still there

$ sudo kextunload  -b net.lundman.zfs
$ kextstat|grep zfs
  183    0 0xffffff7f860eb000 0x3e2000   0x3e2000   net.lundman.zfs (1.9.2) 270F1DC9-2853-37E4-9116-5ABECBEE5BC9 <177 26 8 6 5 3 1>

does the upgraded worked for others?

commented

Ah I know why, it's probably zed. If you use eg LaunchControl (or get into launchctl weeds...) to stop that (or just try to use zfsadm (https://gist.github.com/ilovezfs/7713854#file-zfsadm), you might have a better chance and it's easier.

kernel dependencies cannot be unloaded, so that's normal.

zfsadm -u
and report back ;-)

If you are real fast to unload zfs, then you can unload spl - but yes, they are automatically loaded again. As jmovs points out, likely to be zed.

commented

zfsadm does it for you easily and correctly ;-)

I removed some launchdaemons thanks @JMoVS for the hint

$ launchctl list|grep zfs
1070    0   org.openzfsonosx.zed
1064    0   org.openzfsonosx.zconfigd

$ sudo launchctl remove org.openzfsonosx.zed
$ sudo launchctl remove org.openzfsonosx.zconfigd

$ kextunload -b net.lundman.zfs & kextunload -b net.lundman.spl # worked needed to execute 2 times

But my imported pools are still empty (but mounted), and after reboot the old version shows up again. Also tried kextcache -system-prelinked-kernel

Perhaps time for a fresh new install...
thanks to all for your great support!

We need to find the right procedure though, I had this at one point in beta6. I used

# kextcache -c /Volumes/Macintosh\
      HD/System/Library/PrelinkedKernels/prelinkedkernel -K
      /Volumes/Macintosh\ HD/System/Library/Kernels/kernel -l --
      /Volumes/Macintosh\ HD/System/Library/Extensions
commented

I think the installer will trigger a rebuild of the caches - so maybe try unloading, then installing the new version which then should rebuild the caches

I installed the new version yesterday on the latest Catalina beta (11?), no panics as of yet.

@JMoVS done that unloaded all extensions and than installed 1.9.2 after reboot back to 1.7.4 strange

I tested it on another Installation without previous openzfs installation there I don't had any version problems. But I couldn't see my zpool data, only empty dirs and zfs send from a linux server ended with a segmentation fault.

I didn't made a pool upgrade to don't break the compatibility with my linux host (ubuntu 18.04)
Is this mandatory?

$ ssh root@home zfs send tank/docker@autosnap-2019-10-07_15:42:55TZp0200 | sudo zfs recv -F tank/docker
zsh: exit 255            ssh root@home zfs send  | 
zsh: segmentation fault  sudo zfs recv -F tank/docker

should I open a separate issue in GitHub?

It works now for me, on a fresh install. Also zfs send, recv over ssh

I confirm the zsh receive crash, still in the Catalina GM beta and OpenZFS_on_OS_X_1.9.2_Catalina.zip installed (still no panic though, which is great!)

example:

sudo zfs send  'z@syncoid_lmini.local_2019-10-08:19:12:11' | sudo zfs receive -s -F  'h/z'
warning: cannot send 'z@syncoid_lmini.local_2019-10-08:19:12:11': Broken pipe
[1]    22828 exit 1              sudo zfs send 'z@syncoid_lmini.local_2019-10-08:19:12:11' |
       22829 segmentation fault  sudo zfs receive -s -F 'h/z'

Ah? there is a zfs recv problem?

yes, I apologize for not noticing before, was using it as part of a backup tool, and didn't follow up when that stopped working from my upgraded Mac.

It's definitely crashing for me.

Confirmed, give me a sec here

# lldb /usr/local/bin/zfs
(lldb) settings set target.input-path ./dump
(lldb) run recv tank/lower2
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff6b0d7316 libdyld.dylib`stack_not_16_byte_aligned_error
libdyld.dylib`stack_not_16_byte_aligned_error:
->  0x7fff6b0d7316 <+0>: movdqa %xmm0, (%rsp)

Wot?

Ok appears to be a clang regression in catalina beta, and I compiled it with beta6. I should probably compile it on the release version.

I have uploaded a repack of Catalina version, compiled with released Catalina, and xcode 11.1
https://openzfsonosx.org/wiki/Downloads#1.9.2

I can successfully zfs recv - please give it a try. Technically, the only thing different is the compiler arguments, which only affect userland.

this is great, everything seems to work, thank you!

@lundman I tried ur repack for Catalina... but still, this is the situation I'm in
#749

I have installed 1.9.3.1 on a Mac Pro with 1.5 TB of RAM, six of the 16 TB Seagate Exos drives (2 in a Pegasus J2i and 4 in a Pegasus R4i), all in one zpool, setup in a very basic way:
sudo zpool create -f -o ashift=12 tank disk2 disk3 disk4 disk5 disk6 disk7
I'm a longtime zfs user, but only used zfs (until now) on Linux servers. I'm running Catalina 10.15.3. I am able to import approximately 100 GB or sometimes as much as 200 GB at once, but if I do larger inputs than this, the data transfer will freeze, resulting in a need to reboot the Mac. I'm trying to move data from the Apple-installed 8 TB SSD (APFS format) to the zfs pool. I've spent the last week (very frustrated) about this, after countless freezing and rebooting of the Mac Pro. It sounds like this is reflective of other people have Catalina issues. Is there a fix, please? Suggestions? I tried to post in the forum but my posting had a lot of spam messages as replies, and then my posting was deleted. (I don't know why.) I'm desperate here, friends! All suggestions are welcome. As I'm writing, I tried to move 533 GB from the APFS SSD to the zpool, and it died after 82.69 GB of data was transferred. P.S. My data consists of (roughly) 10 GB per file, plain ASCII text, nothing strange here. Thanks for listening!

I continue to read about what is broken with Catalina. I think I am having kernel panic under high I/O, as a few people mentioned above. I started a thread on Macrumors about my difficulties. Is anyone able to chime in and say, yes, this is probably the difficulty for me too? I'm routinely trying to move large amounts of data (terabytes of data) from the Apple-installed SSD to the zfs pool on my drives, and the transfers consistently die about around 80 GB to 150 GB of transfers are complete. The transfers seem to die with both reads and writes, so I think that the high level of I/O is the indicator. Here's the thread I started on Macrumors:
https://forums.macrumors.com/threads/openzfs-woes-on-a-maxed-out-mac-pro-with-catalina-10-15-3.2222712/
All advice is welcome!

commented

How familar are you with setting boot arge? If you have boot-args „keepsyms=1“, it will keep the symbols allowing us to look into the kernel panic stack trace.

@JMoVS thanks for chiming in!
I am willing to try. I think that I should just do the following (logged into the Administrative account), correct?
sudo nvram boot-args="-v keepsyms=1"
and then reboot, yes?
If so, I will do it. Please let me know. I'm open to your guidance/suggestions here.

a very basic way: sudo zpool create -f -o ashift=12 tank disk2 disk3 disk4 disk5 disk6 disk7

long time zfs user

Please try making that a raidz2 or set of mirrors and try the same data copying. Let us know if it works, and if it doesn't work, let us know the results you get from "zpool status -v" after the copy fails or even if it ends normally.

(A pool with zero redundancy will quite rightly obey the setting of the failmode pool property if something goes wrong with the syncing of a txg, even if it's just a single device I/O error -- e.g. from a bus or controller timeout -- when there is no pool redundancy with which to autorepair/recover. We need pool redundancy here to catch possibilities like driver changes in Catalina meant to improve performance triggering a timing error in your hardware).

And finally, yes, keepsyms=1 is very helpful in the event there is a panic which leaves behind a panic report in /Library/Logs/DiagnosticReports/Kernel*

@rottegift I tried many, many times to copy data to the OpenZFS array in a raidz2 configuration. I used this configuration at the start:
sudo zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O atime=off -O normalization=formD tank raidz2 disk2 disk3 disk4 disk5 disk6 disk7
I spent a full week this week, and I would try to copy (say) 200 GB or 300 GB at a time, from the SSD to the OpenZFS pool, and the copy would fail almost every time. It became depressing. I had to reset the machine many times (almost every single time) during this week. I probably rebooted the Mac Pro more than 2 dozen times in the last week, with this configuration.

Then my sysadmin suggested that I pair the configuration down to the basics, so that's why I mentioned this very simple configuration:
sudo zpool create -f -o ashift=12 tank disk2 disk3 disk4 disk5 disk6 disk7
but I'm still experiencing very regular failures when reading or writing data to/from the APFS (Apple-installed) SSD and the zpool.

I have a strong suspicion it is something to do with OpenZFS, because the drives themselves all work great. When I destroy the zpool and set all 6 drives up with APFS, I can move data back and forth at much, much faster speed, and I've never had a failure when everything was APFS (i.e., things work perfectly when I destroy the zpool and reconfigure the drives to APFS).

How do you "set all 6 drives up with APFS" into a single APFS container with a single volume inside?

Can you try copying with a single-drive pool; choose any of your drives at random? Again, it would be very helpful to have the output of zpool status -v, and ideally a transcript of everything you did at the cli (or attach a screenshot or whatever), both for the ZFS test and for the working APFS test. Our hands will be pretty tied in diagnosing otherwise.

I am sorry to be vague. When I wrote "set all 6 drives up with APFS", I emphasize that they were not (of course) in a single APFS container with a single volume inside. That's not the case. Instead, I destroyed the zfs pool that I had used throughout the last week (I had set up the zfs pool in several ways, to see if anything would work for high I/O). After destroying the zpool, I formatted each drive to APFS and did some extensive testing of reading/writing, and the drives all appear to work excellently in every way. I also combined all six drives into a RAID0 drive and did some extensive testing there too, and the drives behave great, even with high I/O.

As you kindly suggested, OK, I will set keepsyms=1, and I will setup the zpool again, with raidz2, and I will post reports about what I find. More news soon! Thanks for listening and helping! I appreciate you.

OK, I booted into Recovery mode, I ran:
nvram boot-args="-v keepsyms=1"
(no need for sudo in Recovery mode because we are running as root)
and then rebooted. I rebuilt the zpool this way:
sudo zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O atime=off -O normalization=formD tank raidz2 disk1 disk2 disk4 disk5 disk6 disk7
and I tried to copy 522 GB from the APFS SSD to the zpool. It died near the end. Here are the only two files found in this directory from the last hour:
/Library/Logs/DiagnosticReports/
We have this one, created at the start of the data transfer:
DesktopServicesHelper_2020-02-08-122244_Hermione.cpu_resource.diag
and this one, created when the data transfer died:
DesktopServicesHelper_2020-02-08-122925_Hermione.wakeups_resource.diag
I will append ".txt" to the end of each, and will attach them here.

DesktopServicesHelper_2020-02-08-122244_Hermione.cpu_resource.diag.txt

DesktopServicesHelper_2020-02-08-122925_Hermione.wakeups_resource.diag.txt

Six things:

1/ Please supply the output of "sysctl zfs spl" and "sw_vers".

2/ What does "zpool status -v" say during the hang?

3/ During a hang, can you run "sudo spindump" and put the resulting file (it'll be in /tmp by default) somewhere online?

4/ What's the nature of the data that you're copying? Is it highly compressible? Is it random? Is it video? Or is it something like system files (including e.g. the kexts themselves)? How are you copying the data? Finder drag-and-drop? Some app?
Something on the command line? Please be as specific as you can, it will help enormously.

5/ Does the hang happen only with specific data, or a specific copying method? (In particular, does it hang in the same way if you do "dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k" ?)

6/ Can you try doing a more direct comparison with your APFS observations by building a single-disk pool and copying to that? This will help exclude issues wherein the timing of concurrent disk I/Os affect the enclosures in your setup, among other differences arising in your (successful) APFS copies.

I did the same experiment again, and it similarly died again, near the end of the data transfer. Here are the two logs that appeared in the directory: /Library/Logs/DiagnosticReports/ (again, I added ".txt" to the end, so that GitHub would allow me to easily upload them)

DesktopServicesHelper_2020-02-08-124437_Hermione.cpu_resource.diag.txt

DesktopServicesHelper_2020-02-08-125227_Hermione.wakeups_resource.diag.txt

Seventh thing: does the machine hang entirely, forcing you to hit the power switch, or does the CLI/GUI still function enough for you to reboot using the apple menu or a shutdown command ?

@rottegift, many thanks! I will work on some replies! Just a moment, please.

  1. Please supply the output of "sysctl zfs spl" and "sw_vers".

hermione:~ mdw$ sysctl zfs spl
zfs.kext_version: 1.9.3-0
spl.kext_version: 1.9.3-0
hermione:~ mdw$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.3
BuildVersion: 19D76

Please note that, even though it says "1.9.3-0" for the version, I have installed 1.9.3.1, i.e., the latest version from the OpenZFS on X website. Just to be absolutely sure of this, I just reinstalled 1.9.3.1 again, and I verify that this information stays the same. Just FYI!

  1. I am running the experiment a third time for you, so that I can answer the questions as I go. The data is transferring right now. The zpool status never changes, whether I run it before, during, or after the hang. It always looks like this:

hermione:~ mdw$ zpool status -v
pool: tank
state: ONLINE
scan: none requested
config:

NAME                                            STATE     READ WRITE CKSUM
tank                                            ONLINE       0     0     0
  raidz2-0                                      ONLINE       0     0     0
    media-05565E83-BFC8-8C4F-8D39-7F46A7302A32  ONLINE       0     0     0
    media-F06E9BFB-C5AD-7849-9913-D7D9D0C35B33  ONLINE       0     0     0
    media-1DC52120-F0E6-D349-BFEB-6A5028F675B5  ONLINE       0     0     0
    media-A2133D63-658E-6F4A-913F-03D641DD9C6B  ONLINE       0     0     0
    media-0B6B49F0-4BC8-2F49-A897-AEA701094E59  ONLINE       0     0     0
    media-2359F813-BDDA-5849-ABCF-CBD85130AC9B  ONLINE       0     0     0

errors: No known data errors

  1. What's the nature of the data that you're copying? Is it highly compressible? Is it random? Is it video? Or is it something like system files (including e.g. the kexts themselves)? How are you copying the data? Finder drag-and-drop? Some app?
    Something on the command line? Please be as specific as you can, it will help enormously.

The data consists of 56 plain text files, all ASCII characters, nothing strange at all, rendered by a C++ program that I've been using for 14 years. There are 4 additional files: a bash script, a backup of the bash script, the nohup.out file, and the binary generated by the C++ file. I've been copying by Finder just using drag-and-drop, although I'm pretty sure that things die if I copy the files by bash; I think I was moving them that way a week ago, and I can try it that way again, if you like. The data is highly compressible. I think I was getting something like 8.0x compression in the zpool when it only contains this data.

  1. I will come back to your question 3. I haven't (yet) posted the spindump, because the data transfer was successful that time. It happens rarely, perhaps 10% of the time! I'm going to remove the data and transfer it again.
  1. This time, the process died after copying about 89 GB out of 522 GB.
    I ran spindump once while the process was running (see spindump1.txt), again while I thought the process was totally dead (see spindump2.txt), but then the process moved just a little more, so I ran spindump a third time (see spindump3.txt). I hope that is clear. Just for good measure, while typing this message, and the transfer is completely dead, I ran spindump a 4th time (see spindump4.txt).

spindump1.txt
spindump2.txt
spindump3.txt
spindump4.txt

  1. Seventh thing: does the machine hang entirely, forcing you to hit the power switch, or does the CLI/GUI still function enough for you to reboot using the apple menu or a shutdown command ?

This is always the case: I can't use the Finder to reboot, because it always says:
"The Finder can’t quit because some operations are still in progress."
"You can cancel or stop the operations, and then try again."

So I generally reboot the computer from another Mac (with remote login using ssh and the command: sudo shutdown -r now).

Alternatively, if I reboot by holding down the power button on the Mac, it usually requires 2 reboots. During the first reboot, I usually get 90% of the way through the startup splash screen and then I don't get any further, so I usually need to hold down the power button again and reboot, and things go very cleanly the second time. So I elected to just start rebooting using ssh from another machine, because only 1 reboot is required. I'm going to go reboot right now.

Then I will come back to your questions 5 and 6. Thanks for your patience.

  1. Does the hang happen only with specific data, or a specific copying method? (In particular, does it hang in the same way if you do "dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k" ?)

I did this as an administrator, in a bash shell, just FYI:
sudo dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k

It wrote exactly 82407849984 bytes and then died. The bash shell that I was using is hung now. I can only see the size of the data transfer by using another bash shell.

OK, maybe I spoke too soon, when I said that the bash shell died. It now said:

78591+0 records in
78590+0 records out
82407587840 bytes transferred in 406.307946 secs (202820517 bytes/sec)

Indeed, it looks like something got hung right around the time that I killed the process, because at time 13:33, the resulting file size was 82407849984 bytes, and then at 13:36, the file size became 82408636416 bytes.

I'm going to reboot and try this process of transferring random bits, one more time. I'll be back. A reboot probably is not required, but I am just doing it to refresh this experiment entirely. The reboot is usually only needed because when a process like this dies, the operating system still has references to each file that was trying to transfer, as you know, so it won't even allow me to nicely reboot, when we die during a regular file transfer.

OK, looks like this time it wrote 82680348672 bytes and then basically died at 13:47
but I was more patient this time. I didn't do anything at all for 5 minutes, and then the file size bumped up to 83066486784 bytes, and then slight more, to 84115718144 bytes, with both of those jumps occurring at 13:52. Instead of killing the process, I'm going to run home and switch cars with my wife, and see if this process stops on its own or not. I'll be patient. I'll be back in (say) 20 minutes. Looks like things are hung at the moment, but instead of killing the process, I'll let it run now.

Here's the present spindump, before I go!
spindump5.txt

Still stuck at 84115718144 bytes. I'll be back in (say) 20 minutes.

Ok, this is all interesting.

I'll have some more thoughts in a little while, but in the mean time can you share the output of

$ sysctl -h kstat | egrep -i  'dirty|dbuf'

Also mds_stores in spindump.4.txt is an enormous factor and it would be helpful if you would:

$ sudo mdutil -i off /Volumes/tank
$ sudo touch /Volumes/tank/.metadata_never_index

prior to doing any further tests of writing into "tank" in the next few hours.

mds_stores is part of Spotlight and is being very aggressive at chasing the already-written data in the spindump.4.txt case, but falls behind because it's low priority, and thus starts causing actual I/Os to the disk because the data it wants has aged out of the cache. It is also almost certainly holding mmap() references on the files you're write()-ing to indirectly via DesktopServicesHelper, which is an unhelpful complication. Finally, DesktopServicesHelper appears to be writing small chunks to multiple files, and the compressibility of the data and the slowdown is causing a lot of additional slow memory allocations.

It's possible that after a significant wait, your hang would resolve itself as when you thought the hang had happened earlier. That's not a workaround either, and there's no point waiting for more than, say, 30 minutes after the apparent hang. The wait may allow for the draining of a sort of priority inversion that low-IO-priority mds_stores is causing by mmap()ing files that are being written to by high-priority DesktopServicesHelper.

Unfortunately this crossed while you were reporting some results of your dd test, and while I was dealing with other things, so I haven't had the chance to absorb the results of whatever happened during dd.

If you also hang during dd, especially if you hang with mdutil -i off and .metadata_never_index in place, please take a spindump during the apparent hang.