tpwrules / nixos-apple-silicon

Resources to install NixOS bare metal on Apple Silicon Macs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot upgrade to new nixos channel

rowanG077 opened this issue · comments

I upgraded my nix channel and switch to my new configuration. Now everything that requires graphics gives an Illegal instruction core dumped. The source is in the asahi mesa driver:

> gdb glxgears
Reading symbols from glxgears...
(No debugging symbols found in glxgears)
(gdb) run
Starting program: /nix/store/aw4nbyd3z2k3rqygrpn4lxxqf6qy2jma-mesa-demos-9.0.0/bin/glxgears 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/jz866c0q32vx9pag7br7dxrqqxkd0m1x-glibc-2.38-27/lib/".

Program received signal SIGILL, Illegal instruction.
0x0000fffff66b08e0 in driCreateNewScreen2 () from /run/opengl-driver/lib/dri/
(gdb) bt
#0  0x0000fffff66b08e0 in driCreateNewScreen2 () from /run/opengl-driver/lib/dri/
#1  0x0000fffff7766a3c in dri3_create_screen () from /run/opengl-driver/lib/
#2  0x0000fffff7757060 in AllocAndFetchScreenConfigs () from /run/opengl-driver/lib/
#3  0x0000fffff7758118 in __glXInitialize () from /run/opengl-driver/lib/
#4  0x0000fffff7753760 in glXChooseVisual () from /run/opengl-driver/lib/
#5  0x0000000000402cf0 in make_window.constprop ()
#6  0x0000000000401df4 in main ()

Very weird is the dissassembly of that function, it's a miscompile?

[nix-shell:/var/lib/systemd/coredump]$ gdb -batch -ex 'file /run/opengl-driver/lib/dri/' -ex 'disassemble driCreateNewScreen2'
Dump of assembler code for function driCreateNewScreen2:
   0x00000000000708e0 <+0>:     udf     #0
   0x00000000000708e4 <+4>:     udf     #0
   0x00000000000708e8 <+8>:     udf     #0
   0x00000000000708ec <+12>:    udf     #0
   0x00000000000708f0 <+16>:    udf     #0
   0x0000000000070b3c <+604>:   udf     #0
   0x0000000000070b40 <+608>:   udf     #0
   0x0000000000070b44 <+612>:   udf     #0
   0x0000000000070b48 <+616>:   udf     #0
   0x0000000000070b4c <+620>:   udf     #0
   0x0000000000070b50 <+624>:   udf     #0
End of assembler dump.

I have no clue what causes this.

The encoding of udf #0 is 0x00000000. Did something go wrong with the file system? Can you verify the relevant store paths have the right hash? Try nix-store --verify --check-contents.

Doesn't seem to be an issue. After some thinking it came back with:

> nix-store --verify --check-contents
reading the Nix store...
checking path existence...
checking link hashes...
checking store hashes.

Yeah, that means the store is okay. Doesn’t exclude the possibility something went wrong building. What Nixpkgs hash are you on?

nixos-unstable channel at commit 97b17f32362e not using flakes for my system config.

Perhaps we need to explicitly set withLibunwind to false?


libunwind is already set to disabled in the override.

I just tested the most recent unstable channel:

With the same problem.

I'm also having mesa issues, specifically when in overlay mode:


Given the bizarre disassembly above, I wonder if this is more evidence for my llvmPackages theory?

I'm using replace mode.

According to Janne Grunau in Matrix chat, Fedora Asahi now builds mesa+asahi with libunwind, whereas ALARM built mesa+asahi without it.

This is the build specification file for mesa in Fedora Asahi, which I admit I find confusing:

I'm bisecting using this shell

  asahi = import ./apple-silicon-support/packages/overlay.nix;
  pkgs = import /home/rowan.goemans/Documents/engineering/nixpkgs {
    overlays = [ asahi ];

in pkgs.mkShell {
  shellHook = ''
    ${pkgs.gdb}/bin/gdb -batch -ex 'file ${pkgs.mesa-asahi-edge.drivers}/lib/dri/' -ex 'disassemble driCreateNewScreen2'

This will relatively quickly tell me when the function broke.

I still think something machine-specific and probably ephemeral went wrong when building and the file ended up corrupt. I set my system to use the latest main commit of this repo (6e324ab) and the nixpkgs you mention (97b17f32362e). I cannot replicate this issue, nor the faulty disassembly, on my system.

I examined store path /nix/store/yj2pxkiwyf64xi50y6zy7zpmfa705wd6-mesa-24.0.0-drivers/ which came from derivation /nix/store/6qamlsrgfqzgl5k134bvvr6xic4lmzhw-mesa-24.0.0.drv. Unfortunately it seems the mesa build is not fully deterministic so it's a little hard to figure out exactly what went wrong.

You can fix this by remounting /nix/store read-write, nuking the store path (and the main output /nix/store/9qx6h9jhxczji2f94rdn4bjlzyw7mjb6-mesa-24.0.0/) (might also be smart to back them up for examination later), then nix-build --repairing the derivation. Make sure you have the derivation beforehand. Once that's done, reboot and make sure it all works, then run nix-store --verify --check-contents for good measure.

Alternately, and more safely, you could make some trivial change to the Mesa derivation in this repo (e.g. add a fooAttr = "bar"; attribute to the overrides) to force a rebuild.

Can confirm that nuking the nix store paths and just rebuilding fixed it. I have been using nix for ages and never seen anything like this. Thanks for helping me debug this!

I had the same issue just now! weston was crashing with illegal instructions.

I removed the mesa folders from /nix/store/ path (made a backup before just in case)
and then ran sudo nix-store --repair --verify --check-contents which succesfully re-build the mesa driver from source.
i could go back to a graphical system right after.

I noticed a lot of people are having issues related to GPU in the repo. I wonder how many could be affected by this issue?
Maybe it could be helpful for this trick to be in the maintenance/repair instructions.

It is quite frankly bizarre and scary that this issue appeared again. Is it in the same file? Is a compressed version of those store paths small enough to attach to this issue? I can't promise to provide any input but it would be good to have.

I will have to examine the other issues more carefully but I don't think they are related.

It is not the same mesa, its the newer 24.0.1

Attached below are the store paths: