tpwrules / nixos-apple-silicon

Resources to install NixOS bare metal on Apple Silicon Macs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot upgrade to new nixos channel

rowanG077 opened this issue · comments

I upgraded my nix channel and switch to my new configuration. Now everything that requires graphics gives an Illegal instruction core dumped. The source is in the asahi mesa driver:

> gdb glxgears
...
Reading symbols from glxgears...
(No debugging symbols found in glxgears)
(gdb) run
Starting program: /nix/store/aw4nbyd3z2k3rqygrpn4lxxqf6qy2jma-mesa-demos-9.0.0/bin/glxgears 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/jz866c0q32vx9pag7br7dxrqqxkd0m1x-glibc-2.38-27/lib/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000fffff66b08e0 in driCreateNewScreen2 () from /run/opengl-driver/lib/dri/asahi_dri.so
(gdb) bt
#0  0x0000fffff66b08e0 in driCreateNewScreen2 () from /run/opengl-driver/lib/dri/asahi_dri.so
#1  0x0000fffff7766a3c in dri3_create_screen () from /run/opengl-driver/lib/libGLX_mesa.so.0
#2  0x0000fffff7757060 in AllocAndFetchScreenConfigs () from /run/opengl-driver/lib/libGLX_mesa.so.0
#3  0x0000fffff7758118 in __glXInitialize () from /run/opengl-driver/lib/libGLX_mesa.so.0
#4  0x0000fffff7753760 in glXChooseVisual () from /run/opengl-driver/lib/libGLX_mesa.so.0
#5  0x0000000000402cf0 in make_window.constprop ()
#6  0x0000000000401df4 in main ()

Very weird is the dissassembly of that function, it's a miscompile?

[nix-shell:/var/lib/systemd/coredump]$ gdb -batch -ex 'file /run/opengl-driver/lib/dri/apple_dri.so' -ex 'disassemble driCreateNewScreen2'
Dump of assembler code for function driCreateNewScreen2:
   0x00000000000708e0 <+0>:     udf     #0
   0x00000000000708e4 <+4>:     udf     #0
   0x00000000000708e8 <+8>:     udf     #0
   0x00000000000708ec <+12>:    udf     #0
   0x00000000000708f0 <+16>:    udf     #0
   ...
   0x0000000000070b3c <+604>:   udf     #0
   0x0000000000070b40 <+608>:   udf     #0
   0x0000000000070b44 <+612>:   udf     #0
   0x0000000000070b48 <+616>:   udf     #0
   0x0000000000070b4c <+620>:   udf     #0
   0x0000000000070b50 <+624>:   udf     #0
End of assembler dump.

I have no clue what causes this.

The encoding of udf #0 is 0x00000000. Did something go wrong with the file system? Can you verify the relevant store paths have the right hash? Try nix-store --verify --check-contents.

Doesn't seem to be an issue. After some thinking it came back with:

> nix-store --verify --check-contents
reading the Nix store...
checking path existence...
checking link hashes...
checking store hashes.
> 

Yeah, that means the store is okay. Doesn’t exclude the possibility something went wrong building. What Nixpkgs hash are you on?

nixos-unstable channel at commit 97b17f32362e not using flakes for my system config.

Perhaps we need to explicitly set withLibunwind to false?

NixOS/nixpkgs@73f6621

libunwind is already set to disabled in the override.

I just tested the most recent unstable channel: https://releases.nixos.org/nixos/unstable/nixos-24.05pre579329.e92b60158819

With the same problem.

I'm also having mesa issues, specifically when in overlay mode:

#152

Given the bizarre disassembly above, I wonder if this is more evidence for my llvmPackages theory?

I'm using replace mode.

According to Janne Grunau in Matrix chat, Fedora Asahi now builds mesa+asahi with libunwind, whereas ALARM built mesa+asahi without it.

This is the build specification file for mesa in Fedora Asahi, which I admit I find confusing:

https://copr-dist-git.fedorainfracloud.org/cgit/@asahi/mesa/mesa.git/tree/mesa.spec?h=f39&id=197e90d54335ef8fe2c47aa9ead5e5d2a8a3b6d4

I'm bisecting using this shell

let
  asahi = import ./apple-silicon-support/packages/overlay.nix;
  pkgs = import /home/rowan.goemans/Documents/engineering/nixpkgs {
    overlays = [ asahi ];
  };

in pkgs.mkShell {
  shellHook = ''
    ${pkgs.gdb}/bin/gdb -batch -ex 'file ${pkgs.mesa-asahi-edge.drivers}/lib/dri/apple_dri.so' -ex 'disassemble driCreateNewScreen2'
  '';
}

This will relatively quickly tell me when the function broke.

I still think something machine-specific and probably ephemeral went wrong when building and the file ended up corrupt. I set my system to use the latest main commit of this repo (6e324ab) and the nixpkgs you mention (97b17f32362e). I cannot replicate this issue, nor the faulty disassembly, on my system.

I examined store path /nix/store/yj2pxkiwyf64xi50y6zy7zpmfa705wd6-mesa-24.0.0-drivers/ which came from derivation /nix/store/6qamlsrgfqzgl5k134bvvr6xic4lmzhw-mesa-24.0.0.drv. Unfortunately it seems the mesa build is not fully deterministic so it's a little hard to figure out exactly what went wrong.

You can fix this by remounting /nix/store read-write, nuking the store path (and the main output /nix/store/9qx6h9jhxczji2f94rdn4bjlzyw7mjb6-mesa-24.0.0/) (might also be smart to back them up for examination later), then nix-build --repairing the derivation. Make sure you have the derivation beforehand. Once that's done, reboot and make sure it all works, then run nix-store --verify --check-contents for good measure.

Alternately, and more safely, you could make some trivial change to the Mesa derivation in this repo (e.g. add a fooAttr = "bar"; attribute to the overrides) to force a rebuild.

Can confirm that nuking the nix store paths and just rebuilding fixed it. I have been using nix for ages and never seen anything like this. Thanks for helping me debug this!

I had the same issue just now! weston was crashing with illegal instructions.

I removed the mesa folders from /nix/store/ path (made a backup before just in case)
and then ran sudo nix-store --repair --verify --check-contents which succesfully re-build the mesa driver from source.
i could go back to a graphical system right after.

I noticed a lot of people are having issues related to GPU in the repo. I wonder how many could be affected by this issue?
Maybe it could be helpful for this trick to be in the maintenance/repair instructions.

It is quite frankly bizarre and scary that this issue appeared again. Is it in the same file? Is a compressed version of those store paths small enough to attach to this issue? I can't promise to provide any input but it would be good to have.

I will have to examine the other issues more carefully but I don't think they are related.

It is not the same mesa, its the newer 24.0.1

Attached below are the store paths:
broken-mesa.zip