rust-osdev / bootloader

An experimental pure-Rust x86 bootloader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Debug why `Rsdp::search_for_on_bios` is optimized to `ud2`

phil-opp opened this issue · comments

See #420

We worked around the issue for now by preventing inlining of the map_physical_region function, but we should figure out the root cause of this.

This only occurs when using rsdp 2.0.0 and seems to be fixed by rsdp 2.0.1. I've tracked the fix down to this commit: rust-osdev/acpi@a0db984, but I'm not entirely sure why this fixes the bug.

At first glance, I thought this might be because the added iterator complexity slows down the inlining a bit: search_for_on_bios doesn't get fully inlined and its implementation is split into two parts, one setting up the addresses and one doing the work; perhaps the lack of inlining was preventing LLVM from seeing UB. To work around that, I removed the .step(16) call, which resulted in all functions being fully inlined. This didn't break the code though, it wasn't optimized down to ud2.

AFAICT rust-osdev/acpi@a0db984 doesn't fix anything that LLVM should care about. It prevents some out-of-bounds accesses, but for memory that's not managed by LLVM, so LLVM shouldn't be able to optimize out the accesses. Even if LLVM felt that optimizing out the later out-of-bounds accesses was legal, the rsdp could still be in the first address slot and LLVM should never allow that to be optimized out.

In conclusion, even though rust-osdev/acpi@a0db984 fixes this bug, I'm not yet convinced that this didn't just happen by accident.

We should retry to reproduce this with the latest nightlies. There was another LLVM update in rust-lang/rust#121395, which fixed some miscompilations.

I can still reproduce this on 1.78.0-nightly (tested with 090f030).

Shouldn't this raw pointer dereference in rdsp::search_for_on_bios be a volatile read, so the compiler doesn't optimise it out?

https://github.com/rust-osdev/acpi/blob/35f9fba0aed2b00cfd16d0560a991c705666f704/rsdp/src/lib.rs#L106

Can you elaborate on why the compiler would be allowed to optimize out the non-volatile read?

Can you elaborate on why the compiler would be allowed to optimize out the non-volatile read?

Yes, I'm not sure that constructing a slice using slice::from_raw_parts and then reading from a raw pointer constructed from the slice's address with a non-volatile read without side-effects guarantees that the optimiser will keep that read.

Optimising compilers, and particularly LLVM, try to reduce the number of unnecessary memory accesses to improve performance. They will often remove memory accesses that they consider to have no side-effects. Using read_volatile() guarantees that the optimiser will not remove the memory access, even if it considers that there is no side-effect to reading that memory.

I'd argue that side effects from the reads are irrelevant here. What matters are the values that are read. Unless LLVM can somehow statically determine the value at the addresses without doing the reads, I don't see why it would be allowed to optimize out the reads. AFAICT the memory accesses are necessary here because the values are not known until runtime.

True, but LLVM 18 appears to be (mis)optimising this function away, including the memory accesses. I was wondering if that would fix it.