thenickdude / KVM-Opencore

OpenCore disk image for running macOS VMs on Proxmox/QEMU

Home Page:https://www.nicksherlock.com/2021/10/installing-macos-12-monterey-on-proxmox-7/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stuck at two places during boot - only after GPU passthrough

buzuddha opened this issue · comments

Hi,

First off, let me say thank you for the impressive and amazing repo to make virtualizing a hack so easy. I'm new to VMs and switching over from bare metal and have referred to your guides many times.

I can smell victory, but am stuck at the GPU passthrough. I can get it working for windows. Likewise, the OS X VM without the GPU passthrough works great (besides the terrible performance). The same VM with GPU PT breaks during boot at:

AppleFileSystemDriver: using boot-uuid (and I've read through this without seeing an obvious explanation for my case)

or

[PCI Configuration Begin]

I haven't touched the config.plist so I'm not sure what I'm missing here. Any help would be amazing!

Thanks for your time!

The last error message is usually irrelevant so it's hard to diagnose anything based on that.

What GPU and how is your passthrough configured?

Thanks for getting back to me so fast!

I've gotten this message with both my WX7100 and my RX580. In both cases I've added the .rom file in

/var/lib/libvirt/vbios/wx7100.rom

The xml is here: http://ix.io/3WLL

In general I've followed the rising prism single gpu passthrough guide.

1 Enable virtualization options in UEFI
2 add GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt and sudo grub-mkconfig -o /boot/grub/grub.cfg
3 Verify IOMMU groups
4 Configure /etc/libvirt/libvirtd.conf and /etc/libvirt/qemu.conf
5 Configure machine - here I use your proxmox guide to set up the VM
6 Find and dump GPU rom
echo 1 | sudo tee /sys/devices/pci0000:00/path/to/gpu/rom
cat /sys/devices/pci0000:00/path/to/gpu/rom > wx7100.rom
echo 0 | sudo tee /sys/devices/pci0000:00/path/to/gpu/rom
7 Clone the rising prism repo, run the scripts and verify they're in place
8 Attach the PCI addresses for the GPU (video/audio) and paste the path to the rom under source
9 I haven't used the cpu args they say to in the RP guide because I've opted to use your config for AMD.

I have also tried the arch wiki on PCI passthrough. Specifically: I've found the PCI device IDs and added them to /etc/modprobe.d/vfio.conf by the following:
options vfio-pci ids=10de:13c2,10de:0fbb swapping in my IDs, then add the modules and hooks to /etc/mkinitcpio.conf MODULES=(... vfio_pci vfio vfio_iommu_type1 vfio_virqfd ...) regenerate initramfs mkinitcpio -p linux

This results in the device being ignored by at boot but if I launch the VM from the console, I end up with the same hang in OC. So I've taken those modules out so I can work in gnome, which is what I ultimately hope to be able to switch from anyhow.

Sorry for the info dump, but just hoping to be thorough!!!

Single GPU passthrough is really unlikely to work with your RX580 because it suffers from AMD's Reset Bug. If the host initialises it then it cannot be properly reset for the guest to use, which causes macOS boot to hang just as it inits the GPU, about 60% in the progress bar.

I'm not sure on the Reset Bug status of the WX7100, it might be supported by vendor-reset (the RX 580 isn't):

https://github.com/gnif/vendor-reset

The machine model you've got set is really ancient and I've never tested OpenCore-KVM in combination with that. Try pc-q35-6.0 instead. Don't go 6.1 or higher because it requires additional fixes for passthrough.

I tried 6.0 -> no luck. Still stuck at AppleFileSystemDriver: using boot-uuid

Naive question: Is it possible that the WX7100 could work given that windows is able to initiate the GPU? It does suffer from not being released back to linux after a shutdown from windows.

I guess I can try giving the dual GPU set up (tomorrow) a shot, but this then hamstrings me to x8 instead of x16. :^(

x8 will be just fine, the difference between x8 and x16 is mostly just a thing that affects synthetic benchmark scores.

The problem with AMD GPUs is that the GPU doesn't provide a hardware reset method, so AMD had to (re-)implement a reset method within every one of their platform drivers instead. The Windows driver seems to do a better job of this than macOS.

Looks like WX7100 is Polaris 10 just like the RX 580, so it might not work with vendor-reset either (but it's worth a try)

can't seem to make install here. I don't think I understand exactly how to implement dkms. Is there a best resource for this?

All I was able to accomplish by installing linux-headers was to allow the make function to proceed but then ended up with a bunch of failures to boot and missing kernel files during mkinitcpio. I had to chroot into my system from the live iso, remove the linux-headers package and regenerated the initramfs in order to boot again. Won't have more time to work on this tonight.

Rather than using make install you want to use dkms to manage building and installation:

dkms install .

dkms is quite popular so I'm sure you could find a guide dedicated to whatever distribution you're currently using.

Well, no luck with the vendor-reset package. I got stuck at sleep status 0xfffffff I'll try a dual GPU set up.

Well dual GPU attempts unsuccessful at the moment. My WX7100 and RX580 have the same device IDs. I've attempted to prevent the drivers from binding the GPU using the arch guide workaround.

To test I've plugged on monitor into WX7100 and the other into the RX580 but I still get graphical output from both. Starting the VM causes gdm to restart, essentially logging me out of my gnome session. I can reverse this behavior by removing the MODULES=(... vfio_pci vfio vfio_iommu_type1 vfio_virqfd ...) from /etc/mkinitcpio.conf and regenerating the initramfs. Arg!

Also everything seems choppy now with 2 GPUs _o_/

EDIT: Interesting: The windows VM also doesn't start, but here I get an error: http://ix.io/3WSp

hmm actually I think I'm not loading the vendor_reset:

sudo dmesg | grep vendor_reset
[ 1.277569] vendor_reset: loading out-of-tree module taints kernel.
[ 1.277590] vendor_reset: module verification failed: signature and/or required key missing - tainting kernel
[ 1.302928] vendor_reset_hook: installed

Those look like warnings rather than fatal errors. If vendor-reset is working then you'll see log messages in dmesg from it at VM launch time explaining what methods it's selecting to reset your GPU.

Hmmm I don't see much in the dmesg about vendor reset during the booting of the VM.

dmesg -wH gives some output and then an infinitely repeated warning
vfio-pci 0000:0d:00:.0 BAR ): can't reserve [mem 0xf8000000000-0xf9fffffffff 64bit pref]

not sure if I should give up and move to the proxmox method :*(

If you don't see messages like this then vendor-reset isn't working:

vfio-pci 0000:03:00.0: AMD_POLARIS10: version 1.0
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing pre-reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: GPU pci config reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing post-reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: reset result = 0

vfio-pci 0000:0d:00:.0 BAR ): can't reserve [mem 0xf8000000000-0xf9fffffffff 64bit pref]

This means a host driver is still using this device, so it cannot be detached for passthrough. To find out which one, run "cat /proc/iomem" and find the driver that is reserving that memory range starting at 0xf8000000000.

BTW Proxmox doesn't improve the situation for single-GPU passthrough of reset-bugged AMD cards. For dual card setups it does help, since you can set one card as primary in your host UEFI settings and avoid initialising the second one completely.

cat /pro/iomem looks like efifb?

These are the two video cards:

840000000-fdffffffff : PCI Bus 0000:00
  f800000000-fa001fffff : PCI Bus 0000:0d <----WX7100
    f800000000-f9ffffffff : 0000:0d:00.0
      f800000000-f8002fffff : efifb
    fa00000000-fa001fffff : 0000:0d:00.0
  fb00000000-fc001fffff : PCI Bus 0000:0e <----RX580
    fb00000000-fbffffffff : 0000:0e:00.0
    fc00000000-fc001fffff : 0000:0e:00.0

So - went into /etc/default/grub and added video=efifb:off, regenerated grub and initramfs. Now I'm back to AppleFileSystemDriver: using boot-uuid. If I profile the GPUs I see that the WX7100 is using an amd kernel module...should that be vfio-pci?

0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b0d0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b0d
	Flags: bus master, fast devsel, latency 0, IRQ 255, IOMMU group 32
	Memory at f800000000 (64-bit, prefetchable) [size=8G]
	Memory at fa00000000 (64-bit, prefetchable) [size=2M]
	I/O ports at f000 [disabled] [size=256]
	Memory at fce00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at fce40000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu

0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 0525
	Flags: bus master, fast devsel, latency 0, IRQ 247, IOMMU group 33
	Memory at fb00000000 (64-bit, prefetchable) [size=4G]
	Memory at fc00000000 (64-bit, prefetchable) [size=2M]
	I/O ports at e000 [size=256]
	Memory at fcd00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at fcd40000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

So this did solve the windows VM handoff at shutdown issue. Now when I shut down win10 VM on the WX7100, gdm starts back up on the RX580 so that's promising. Still stuck with OC, but it seems to happen at multiple different spots during boot.

Should I just buy a more recent AMD card? I think 6600's are ~350 nowadays.

holy crap. I found it in a reddit post that you had responded to. I had to disable above 4G decoding. This is weird because in all other circumstances with hackintosh, I've had to enable it.

Ok well I am thrilled and embarrassed. Thanks for your help!

Glad to hear it worked out in the end! It's curious that it passed through to Windows guests with that enabled.

Just popping back in to say that I've got single GPU passthrough working now for both win and mac! It was the above 4G decoding that was the issue.

Linux will hand off the GPU to both systems.

Windows can hand the GPU back to linux after shutdown now quite nicely, but still can't get mac to do it.

any one know it? it can't work for me even if I disable the above 4G decoding