tock / libtock-rs

Rust userland library for Tock

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix relocation

Woyten opened this issue · comments

In order to make global variables and dynamic dispatch work, we need to compile binaries conforming to the R_ARM_SBREL32 relocation model.

As far as I understand we need to perform two steps:

  • Migrate the code of tock/userland/libtock/crt0.c to Rust
  • Pass compiler flags to the LLVM/LLD toolchain equivalent to:
    -msingle-pic-base
    -mpic-register=r9
    -mno-pic-data-is-text-relative

@alevy I played around with the relocation problem the whole weekend but I am completely lost now.

My findings:

Relocations are not emitted by default. They can be emitted via -C link-args=--emit-relocs.

I am not sure whether -C relocation-model=ropi-rwpi represents the correct relocation model due to the following problems I encountered:

  • vtables point to adresses above 0x80000000. Accessing them, obviously, crashes the program. I tested the vtable value using the following code:
    let my_int = 5usize;
    let vtable_location = mem::transmute::<_, (usize, usize)>(&my_int as &MyTrait).1; // 0x00020b64 (ACCESSIBLE)
    let first_vtable_entry = ptr::read_volatile(vtable_location as *const usize);     // 0x80000737 (NOT ACCESSIBLE)
  • static muts crash during link time. The following example results in an unrecognized reloc error:
    static mut STATIC_MUT: usize = 0;
    debug::print_as_hex(STATIC_MUT);
    STATIC_MUT = 1;
    debug::print_as_hex(STATIC_MUT);
  • There are relocations of different types for the trait objects but none of them is of type R_ARM_SBREL32. I queried the relocations using:
    readelf --relocs -W cortex-m4.elf|rg MyTrait
    The printed relocation types are R_ARM_THM_MOVW_PREL_NC, R_ARM_THM_MOVT_PREL and R_ARM_ABS32.
  • The data segment is empty.

If, on the other hand, I build the code using -C relocation-model=pic, I observe the following:

  • vtables still don't work. In fact, they crash a little earlier:
    let my_int = 5usize;
    let vtable_location = mem::transmute::<_, (usize, usize)>(&my_int as &MyTrait).1; // 0x8002002c (NOT ACCESSIBLE)
  • static muts can be linked but they point to garbage:
    static mut STATIC_MUT: usize = 99;
    let dereferenced = &STATIC_MUT as *const _ as usize; // 0x8002002c (NOT ACCESSIBLE)
  • Relocations are of type R_ARM_ABS32 and R_ARM_REL32. This seems closer to what we want but it's still not R_ARM_SBREL32.
  • There is a data section but I cannot tell whether the content makes sense.

In any case, no matter which relocation model I choose:

  • The GOT is empty. Do we expect elements in it? I guess not as we compile a static binary from scratch.
  • The value of r9 has no effect. I would expect that some relocatable references depend on r9 according to llvm-mirror/lld@29241e3.
  • The reldata part of the _start header is located at 0x80000000 which, again, is not accessible.

Do you think I am on the right track?

@torfmaster and I were finally able to prove that trait objects can work in Tock OS. See #56 for more details.

Unfortunately, I cannot recommend applying the strategy mentioned in the PR. It has too many drawbacks (like no real position independence) and relies on hacks or details that might cease to be valid in a newer version of rustc.

In order to get libtock-rs binaries running properly we need to fix some external tools. The following strategy should enable the remaining Rust features:

  • Fix trait objects
    We think that rustc contains a bug in the ropi relocation model leading to corrupt vtable lookups. Our compiled binaries try to find vtable functions at absolute addresses. This, however, conflicts with the idea of position independent code execution. The most probable reason for the bug is that an offset based on the program counter has been forgotten.

  • Fix static muts
    static muts can be compiled but not linked. According to llvm-mirror/lld@29241e3, LLD supports R9 based relocation. In practice, it refuses to process the emitted relocation types. This could be a problem with the LLD version used by rustc.

  • Improve string literal ergonomics
    If we want to print the string literal to the console, we need to manually copy it from flash to RAM first (e.g. by using String::from). Otherwise, the allow operation of the kernel will crash because we are not allowed to allow memory on the flash. My proposed possible solutions to the problem:

    1. Elegant: Add a new tock syscall (e.g. allow_ro) with read-only access to the flash. There are other reasons why an allow_ro syscall is a good idea like better borrow checker support.
    2. Difficult: Relocate the string literals from flash to RAM during startup. The copy step is easy. The difficult part is to adapt rustc, s.t. string literals are no longer accessed in flash but in RAM. I think that's what libtock-c is doing. It also requires that the linker problem mentioned above is solved.
    3. Poor: Ignore the problem and enforce owned Strings (needs allocation, we want to opt-out for it) and/or write! (slow).

@alevy What do you think about those problems? We would be happy if someone else could help fixing the rustc and LLD problems. The rustc problem might be interesting for @japaric and the embedded working group as well.

There are at least two problems getting in the way of rustc support for ROPI-RWPI:

  1. LLVM's ROPI-RWPI implementation does not move .rodata values that are relocated into .data, which prevents the relocations from being implemented on microcontrollers (.rodata is truly RO on flash).
  2. For some reason, inter-crate references that should be using ROPI relocation use RWPI relocation. This issue is rustc specific; I was unable to reproduce it using clang.

In the meantime, static linking of Rust apps appears to be possible (avoiding relocation entirely). I'm putting together a PR to implement static linking. Fortunately, static linking doesn't require any code changes that are incompatible with ROPI-RWPI, although it requires linker script changes.

Static linking works as of #64 ; making ROPI-RWPI relocation work correctly is a larger problem that'll take longer to solve.

Will this be achieveable on platforms other than ARM? We may wish to execute embassy on more achitectures.

From the perspective of libtock-rs, I think the hope is for this to be achieved on both ARM and RISC-V eventually. Unfortunately we are blocked on upstream support in LLVM for PIC in both cases. Thus, currently it does not work on either architecture -- though libtock-c does support relocatable apps on ARM only, thanks to using gcc rather than LLVM.

I think the latest status of RISC-V ROPI/RWPI can be followed here: riscv-non-isa/riscv-elf-psabi-doc#128

And the latest status for the issues with ARM thumb targets can be followed here: rust-lang/rust#54431

I'm also rather interested in working relocations. Having read the Rust-lang thread, LLVM exchange, and the rust-embedded IRC log, I noted two statements that stand out.

In the LLVM emails, "I don't think such transformation belongs into clang.", regarding initializers, and "for apps you could roll your own in-kernel dynamic linker", from the IRC discussion.

Was it considered to ignore the ROPI/RWPI approach, and instead rely on relocations and fix them up using a linker? That step could even take place in tockloader, while flashing (assuming that applications once flashed aren't going to be moved again).

If there are still problems with relocations not being emitted enough, the actual step of linking object files could be moved to flash-time, with the relevant offsets (or linker files) calculated based on where the app is going to land.

If any of those approaches is not totally crazy, I'm willing to try implementing it - loadable applications are a must for me.

lowRISC is working on an ePIC implementation for RISC-V and hopes to upstream it to LLVM: https://github.com/lowRISC/epic-c-example / https://github.com/lowRISC/llvm-project/commits/epic. Our current hope is that this work will at least make loadable applications possible for RISC-V, though support for ARM may take longer as I do not believe lowRISC currently plans port this work to other architectures.

Was it considered to ignore the ROPI/RWPI approach, and instead rely on relocations and fix them up using a linker? That step could even take place in tockloader, while flashing (assuming that applications once flashed aren't going to be moved again).

If there are still problems with relocations not being emitted enough, the actual step of linking object files could be moved to flash-time, with the relevant offsets (or linker files) calculated based on where the app is going to land.

I'm pretty sure that your idea is workable. It is not a solution that works for every user of libtock-rs, which is why lowRISC is working on ePIC (but as Hudson mentioned, they're primarily focused on RISC-V).

One other solution that the Tock project has looked at (which libtock-c uses) is to compile each process multiple times for different locations, and have tockloader choose which TBF file to deploy on a system based on the addresses it is compiled for.

Didn't libtock-c use actually position-independent binaries? That's what I gathered from the discussion about libtock-rs.

libtock-c uses actually position-independent binaries for ARM targets, but gcc does not support position-independent binaries for RISC-V.

Thanks. I just realized that static linking also makes the RAM address fixed, which is rather suboptimal when applications are meant to be able to be loaded in any order. Perhaps some form of PIC with rwdata section relocations at runtime could solve that - if such relocations are supported by the compiler.

Thanks. I just realized that static linking also makes the RAM address fixed, which is rather suboptimal when applications are meant to be able to be loaded in any order.

Yes -- when I said "compile each process multiple times for different locations", each "location" is a combination of a flash address range and a RAM address range.

Perhaps some form of PIC with rwdata section relocations at runtime could solve that - if such relocations are supported by the compiler.

I do not think that is possible with any relocation mode that LLVM supports, unfortunately.

@dcz-self this (rather old) blog post explains a little bit of the complexity with PIC: https://www.tockos.org/blog/2016/dynamic-loading/

libtock-c works because GCC supports the particular kinds of variants of PIC we need, while LLVM doesn't (actually there was a reasonably complete patch from somebody at ARM, I believe, back in the day but it wasn't accepted).

Perhaps some form of PIC with rwdata section relocations at runtime could solve that - if such relocations are supported by the compiler.

Proposals are very welcome! The main constraint are: (1) code lives in flash, not RAM, and we probably don't want to be rewriting flash on every process reboot (because of write degredation and performance) and (2) the binary size should be reasonably small---all the extra information retained for dynamic loading in, e.g., Linux ELFs results in executables the are typically way too big for the target platforms. But neither of these means there isn't some sweet spot design that is possible.

With Rust merged into GCC 13, will that eventually make it possible to resolve this as the implementation matures?

With Rust merged into GCC 13, will that eventually make it possible to resolve this as the implementation matures?

rustc_codegen_gcc is on track to be usable well before GCC's Rust frontend, so I don't think that changes anything. Either way, GCC only supports the necessary relocation mode on ARM, not RISC-V, so it's not a complete solution.

If ePIC ends up being RISC-V only, we may end up implementing relocation using rustc_codegen_gcc on ARM and ePIC on RISC-V.