rust-lang / rust

Empowering everyone to build reliable and efficient software.

Home Page:https://www.rust-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Port `compiler-rt` intrinsics to Rust

japaric opened this issue · comments

Rationale

Porting compiler-rt to Rust is one of the remaining obstacles towards the intersection of our "on the fly compilation of std" and "rust everywhere" dreams.

For our goal of "on the fly compilation of std" (or any other set of "standard" crates), we want to minimize the number of C dependencies required to build std as these complicate the compilation process: users need a C (cross) compiler and we (rust-lang/rust) have to support (cross) compiling each of these C dependencies, which are wrapped in crates, to every target (even custom ones!) that downstream users may use -- this last part leads to complicated build.rs scripts that need to handle conditional compilation logic and deal with obscure gcc flags . On Linux, there are three C dependencies that we have to deal with: backtrace, jemalloc and compiler-rt. backtrace and jemalloc are optional on Linux and not available on some other platforms but compiler-rt is mandatory on most (all?) the targets we support.

This issue is about porting compiler-rt to Rust. Once ported, we can say goodbye to its complicated build.rs and make std easier to (cross) compile! An extra advantage is that, with this change, the compiler-rt intrinsics will receive the same optimizations as the std crate. This is not done today because it would make the build.rs even more complicated: flags like -march and -mcpu would have to be conditionally passed to gcc according to $TARGET.

The process

The goal is simple: We have to port each and every compiler-rt intrinsic along with their tests to the rustc-builtins crate.

The process could go two ways:

  1. Using a "wholesale" approach: We can develop the new rustc-builtins crate out of tree: porting intrinsics and unit tests over time. Once all the intrinsics and unit tests are ported we can replace the in-tree rustc-builtins crate with the out-of-tree one and have the buildbots (and probably crater) serve as an integration test.
  2. Using an incremental approach: We can rely on the fact that gcc_s provides the same intrinsics as compiler-rt and simply remove rustc-builtins's build.rs. This effectively means that rustc-builtins will no longer provide any intrinsic and that std programs will instead use gcc_s' intrinsics. Then we can start porting intrinsics + unit tests and adding them to rustc-builtins one by one. Each time a intrinsic is added, Rust programs will start using that intrinsic instead of the gcc_s' one.

The advantage of (2) is that we get an std that's easy to cross compile early on (because rustc-builtins is essentially empty!). Its disadvantage is that no_std programs which don't link to gcc_s and that depend on compiler-rt intrinsics will abruptly stop to compile because of linker errors (undefined reference to $intrinsic).

Prioritization

Some Rust targets use more or different intrinsics than others. If we take the incremental approach mentioned in the previous section, it makes sense to prioritize porting the intrinsics required by tier-1 platforms. This gist contains a "hack" to yields the list of intrinsics required to link rustc for a certain target plus lists of intrinsics generated with this "hack" for a few (right now, two) targets. The lists contained therein can be used to decide which intrinsics to prioritize.

Drawbacks

On each LLVM upgrade, we would have to carefully check if any compiler-rt intrinsics have been added (we already have to do this today) and the upgrade would be blocked on porting those new intrinsics to Rust first (this is the extra work that this change would create).

Unresolved questions

  • Pick an approach to implement this change.

compilerrt_abort

Some intrinsics can fail (e.g. absvdi2). compiler-rt handles the failures by "aborting" -- it calls a compilerrt_abort() function. Should this function be ported to Rust? Or should the intrinsics, instead, panic! on failure?


cc @alexcrichton @brson

I'm probably missing some information about the details of compiler-rt, but wouldn't a third way to go be to implement some intrinsic in rust and then remove the corresponding c file? Wouldn't this give us (some of) the advantage of (2) without the disadvantage?

@TimNN We can do that too but it doesn't give us the advantage of (2) because we would still rely on a external C (cross) compiler to compile all the intrinsics that haven't yet been ported to Rust. The advantage of (2) is that we can immediately get rid of the dependency on a external C (cross) compiler.

Ah, I see, I should probably have read that section more carefully; thanks for clearing that up.

This effectively means (..) that std programs will instead use gcc_s' intrinsics.

Note that (at least on Linux) this already happens today, will still happen until #35021 lands and probably has been the case for a long time. Evidence below:

#![feature(start)]

use std::ptr;

#[start]
fn start(_: isize, _: *const *const u8) -> isize {
    let x = unsafe {
        // to prevent the compiler from optimizing away the whole floating point operation
        ptr::read_volatile(0x0 as *const f64)
    };
    let y = 1.;
    let z = x + y;

    z as isize
}
$ rustc --target arm-unknown-linux-gnueabi -C linker=arm-linux-gnueabi-gcc -Z print-link-args foo.rs
"arm-linux-gnueabi-gcc" (..) "-l" "dl" "-l" "rt" "-l" "pthread" "-l" "gcc_s" "-l" "pthread" "-l" "c" "-l" "m" "-l" "rt" "-l" "util" "-l" "compiler-rt"

Note: -l gcc_s appears before -l compiler-rt

$ arm-linux-gnueabi-objdump -Cd foo
(..)
00000834 <foo::start::h949dda06cb9c83c0>:
(..)
 86c:   ebffff6b        bl      620 <__aeabi_dadd@plt>
 870:   ebffff5e        bl      5f0 <__aeabi_d2iz@plt>
(..)

Note: @plt dynamic loading)

$ arm-linux-gnueabi-nm foo
(..)
         U __aeabi_d2iz@@GCC_3.5
         U __aeabi_dadd@@GCC_3.5
         U __aeabi_unwind_cpp_pr0@@GCC_3.5
(..)

Note: U (undefined) and @GCC_3.5 (symbol versioning)

The intrinsics for floating point operations come from libgcc_s.so rather than from libcompiler-rt.a.

Using a "wholesale" approach: We can develop the new rustc-builtins crate out of tree

I've started with this approach in this repository. I've only ported a few intrinsics, mainly the ones that are needed for ARM Cortex-M targets to get basic stuff like ptr::copy_nonoverlapping working, but I'm already using it in one of my projects.

I think we should also include some other functions that are required by Rust but not provided by compiler-rt:

  • fmod and fmodf
  • memcmp, memcpy, memmove and memset

These should be defined as weak symbols so that an optimized implementation from libc can be used if available.

@Amanieu Sounds like a good idea to me. But, is there a mechanism to mark a symbol as weak from within Rust? I've done it before for executables but using a linker script; that won't work for rlibs because building those doesn't involve the linker.

On Windows, the PE/COFF specification describes weak externals which can be used for the purpose of defining symbols such as memcpy and whatnot so it can work without the CRT, but if you do link the CRT it'll use those instead.

“Weak externals” are a mechanism for object files that allows flexibility at link time. A module can contain an unresolved external symbol (sym1), but it can also include an auxiliary record that indicates that if sym1 is not present at link time, another external symbol (sym2) is used to resolve references instead.

If a definition of sym1 is linked, then an external reference to the symbol is resolved normally. If a definition of sym1 is not linked, then all references to the weak external for sym1 refer to sym2 instead. The external symbol, sym2, must always be linked; typically, it is defined in the module that contains the weak reference to sym1.

Weak externals are represented by a symbol table record with EXTERNAL storage class, UNDEF section number, and a value of zero. The weak-external symbol record is followed by an auxiliary record with the following format:

@retep998 Ok, that sounds like there might exists a LLVM thing to mark a symbol as "weak". I guess my actual question was: is there any Rust attribute to mark a function as weak?

@japaric I'd personally prefer the incremental approach of just moving intrinsics from C to Rust over time. That way we can ensure that everyone keeps building locally, and we could even add cargo features perhaps to disable all C intrinsics so if all the Rust intrinsics suffice for your target you can get by with pure Rust anyway.

is there any Rust attribute to mark a function as weak?

Perhaps #[linkage = "extern_weak"]? For AVR work, I just used some custom assembly...

@shepmaster

#[linkage = "weak"] does the trick (on Linux at least):

#[linkage = "weak"]
#[no_mangle]
pub extern fn memfoo() {}
$ nm target/debug/libfoo.rlib
0000000000000000 W memfoo
0000000000000000 N __rustc_debug_gdb_scripts_section__

@alexcrichton Sounds good to me. One thing I like about having rustc-builtins in its own repo is that I've set up testing infrastructure (Travis + AppVeyor) to test the intrinsics for several targets (using QEMU), which I think is not trivial to integrate with the buildbots. Also, fast test times. Perhaps we can keep this advantage if we turn rustc-builtins into a submodule.

C:\msys64\home\Peter\test> dumpbin /symbols libweak.rlib
Microsoft (R) COFF/PE Dumper Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file libweak.rlib

File Type: LIBRARY

COFF SYMBOL TABLE
000 00000000 SECT1  notype       Static       | .text
    Section length    0, #relocs    0, #linenums    0, checksum        0
002 00000000 SECT2  notype       Static       | .data
    Section length    0, #relocs    0, #linenums    0, checksum        0
004 00000000 SECT3  notype       Static       | .bss
    Section length    0, #relocs    0, #linenums    0, checksum        0
006 00000000 SECT4  notype       Static       | .text
    Section length    1, #relocs    0, #linenums    0, checksum  26D930A, selection    1 (pick no duplicates)
008 00000000 SECT4  notype ()    External     | memfoo
009 00000000 SECT5  notype       Static       | .xdata
    Section length    8, #relocs    0, #linenums    0, checksum CCAA009E, selection    5 (pick associative Section 0x4)
00B 00000000 SECT6  notype       Static       | .pdata
    Section length    C, #relocs    3, #linenums    0, checksum CCAA009E, selection    5 (pick associative Section 0x4)

String Table Size = 0x0 bytes
libweak.rlib : fatal error LNK1106: invalid file or disk full: cannot seek to 0x31A1

Doesn't look like it works on Windows. If it worked there would be a WeakExternal in there.

@japaric I'd be down for that! I agree that moving CI off buildbot is always best :)

@japaric Also #[linkage] is broken anyway at the moment #33992

Now that #35021 has landed, let's re-start the discussion here!

As I mentioned before, I started working on this in the rustc-builtins repo. And we @Amanieu
and @mattico have already ported a good chunk of intrinsics and have done an excellent job
figuring out what we don't need to port. If @Amanieu and @mattico are OK with it, I'd like to
hand over this repo to the rust-lang org and start working towards integrating that repository into
rust-lang/rust.

Tentative next steps:

  • Rename the repo to compiler-builtins to match the name that was finally decided in #35021.
  • Add a Cargo feature, probably called "c" that, when enabled, will compile the intrinsics, that
    have not been yet ported to Rust, from the compiler-rt C source code. This feature will be off by
    default and the std crate will depend on having this feature enabled.
  • Hand over the repo to the rust-lang org
  • Enable official bors on the repo and set up reviewers
  • Send a PR integrating that repo into rust-lang/rust (easier said than done)

Unresolved questions:

  • How should we include compiler-rt source code into the rustc-builtins repo? Some options:
    • Add rust-lang/llvm as a submodule to that repo. Keep that submodule and the llvm submodule in
      rust-lang/rust in sync.
    • No submodules. Just check-in a current snapshot of compiler-rt source in that repo.
    • Don't include the source in that repository. Instead assume that rustc-builtins will always be
      compiled as part of rust-lang/rust. Then rustc-builtins build.rs will have hardcoded the path to
      compiler-rt source, which lies "outside" of the rustc-builtins directory.
  • The rustc-builtins crate exposes memcpy, memmove, memset and memcmp as weak symbols (these are
    just rlibc re-exports) but only for ELF files (i.e. not on osx or windows). That would be an
    addition compared to what we provide today. The rationale for the addition was that these symbols
    also qualify as "intrinsics" and it's nice to just depend on a single crate, rustc-builtins,
    instead of having to reach out for rlibc. Also, these symbols are only ever used if not linking to
    libc, which pretty much means no_std executables. Should we keep this addition?
  • What will happen to the tests in that repo? It currently has a bunch of quickcheck test of the
    form: assert_eq!(__muldi3(a, b), a * b) that, right now, compare each intrinsic to its native
    operation, if the architecture has one, or to the compiler-rt (C) implementation of the same
    intrinsic. The catch is that once we merge rustc-builtins into rust-lang/rust, for the latter
    case, we'll end up comparing the intrinsic to itself! Because e.g. a * b will lower to
    __muldi3(a, b) and that __muldi3 will also come from rustc-builtins. We'll have to think about how to
    avoid making those test cases trivial.

It might be beneficial to prefer linking to libgcc over compiler-rt since libgcc tends to have better optimized implementations of some builtins (in particular soft float stuff).

@Amanieu You mean adding something like this:

#[link(name = "gcc_s")]
extern {
    fn __muldi3(u64, u64) -> u64;
}

to rustc-builtins instead of compiling the C implementation from compiler-rt? Seems fine given that, pre #35021, we have been using gcc_s' intrinsics instead of compiler-rt's due to how we have been passing the -l flags to the linker. IMO, these extern blocks should land behind an opt-in Cargo feature so rustc-builtins remains C-free by default.

Thanks for pushing this forward @japaric!

Tentative next steps:

Sounds good to me! We'd probably move the repo to rust-lang-nursery, but otherwise seems plausible to me.

How should we include compiler-rt source code into the rustc-builtins repo? Some options:

I'd recommend a submodule.

Should we keep this addition?

I think it's fine to include, but I'd prefer they were turned off for libstd builds

What will happen to the tests in that repo?

To me this seems like a pretty critical piece to fix. One possible solution though could be:

  • Run all tests against libgcc's implementations (as @Amanieu mentioned)
  • Control symbol names via #[export_name], don't have weird symbol names in Rust
  • Disable all #[export_name] in a test build so everything has a different symbol in testing mode.

@alexcrichton

re: testing against libgcc

Do you (or anyone else) know how to re-export a symbol from a specific C library as a Rust
function?

The issue I see is that there three libraries/crates that contain symbols (intrinsics) with the same
name: compiler_builtins, libgcc.a and libgcc_s.so and all those get passed to the linker.

For example, libgcc.a and libgcc_s.so do not contain the __muldi3 symbol (on x86_64) but this
compiles/works:

// gcc_builtins/src/lib.rs
#![no_std]

#[link(name = "gcc")]
extern {
    pub fn __muldi3(a: u64, b: u64) -> u64;
}
// foo/src/main.rs
extern crate gcc_builtins;

fn main() {
    let a = 2;
    let b = 3;
    let c = unsafe {
        gcc_builtins::__muldi3(a, b)
    };

    println!("{:?}", (a, b, c));
}
$ cargo rustc -- -Z print-link-args
"cc" "(..)/libcompiler_builtins-411f48d3.rlib" "-l" "gcc" "-l" "dl" "-l" "pthread" "-l" "gcc_s" "-l" "pthread" "-l" "c" "-l" "m" "-l" "rt" "-l" "util" 

which implies that __muldi3 comes from the compiler_builtins crate, but the intention was to
use the __muldi3 from libgcc.a -- IOW, the expected result was that this should have failed to
link because there is no __muldi3 in libgcc.a.


Ultimately, I'd like to test the ported builtins like this:

assert_eq!(::__multi3(a, b), gcc_builtins::__multi3(a, b))

@japaric That's not possible, the linker ensures that there exists only one instance of each symbol name. The only way to do something like this would be to dynamically load libgcc.so and get a specific symbol from it.

When you link to a symbol, as far as I know there's no way to tell the linker which library to get the symbol from. It just gets it from the first library it finds it in, depending upon linker order. The #[link] simply serves to pass some more stuff to the linker, with absolutely zero association with the extern block it is attached to (other than telling the reader that those symbols probably come from that library).

Theoretically if we integrated LLD as a library we could teach it which library to get a specific symbol from, but that's far off in the future.

One possible (though not ideal) solution is to create versions of the
libraries that have prefixed symbols using objcopy --prefix-symbols=xxx.
Not sure how this would work with windows-msvc.

On Fri, Sep 16, 2016, 1:05 PM Amanieu notifications@github.com wrote:

@japaric https://github.com/japaric That's not possible, the linker
ensures that there exists only one instance of each symbol name. The only
way to do something like this would be to dynamically load libgcc.so and
get a specific symbol from it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35437 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA0EppfPSnFI_fwmtJLWQ0T4uMjdgYCDks5qqtpegaJpZM4JeYmK
.

@retep998 Yeah, that's my understanding as well. I was wondering if there was some trick to effectively rename a (C) symbol using a Rust crate.

@Amanieu Thanks, I'll look into a dynamic loading solution.

@mattico An interesting idea. Will give a shot.

On windows-msvc, LIB /DEF /EXPORT:externalName=internalName can be used to do the same, but needs to have every renamed symbol passed as an argument.

@mattico I didn't try the "prefix the symbols" approach but I've implemented the dynamic loading solution in rust-lang/compiler-builtins#67.

but needs to have every renamed symbol passed as an argument.

This sounds like quite a bit of work. The other issue I see with this approach is that gcc_s doesn't expose the same symbols on all architectures. For example, muldi3 is missing on x86_64 but available on arm. This means we'll require conditional compilation to handle these differences between targets -- effectively, we'd be keeping lists of intrinsics available on each target. The dynamic loading approach elegantly solves the problem of differences among targets by simply returning None if the intrinsic is not available.

@japaric

effectively, we'd be keeping lists of intrinsics available on each target.

We'd only need to do this on MSVC where we don't have objcopy, and we could automate this with a script using nm, but I agree that dynamic loading is a better solution in this case.

I'm assuming that this dynamic loading (with dlopen or LoadLibrary) you're talking about is only going to be used for tests, and not for normal Rust binaries.

We'd only need to do this on MSVC where we don't have objcopy

MSVC doesn't have a libgcc to begin with.

is only going to be used for tests

Yes, this is only for tests.

MSVC doesn't have a libgcc to begin with.

Does it have any other library/dll that may contain these intrinsics?

In any case, it's more important to test different architectures than to test different OSes on the same architecture. Unless, it's an OS-specific intrinsic.

Alternatively, we could test the Rust implementation against gcc_s, if available, or against a compiler-rt that we, ourselves, compile, if there's no gcc_s.

Does it have any other library/dll that may contain these intrinsics?

msvcrt.lib provides intrinsics and they are available for static linking only. There is no runtime DLL equivalent. However, they do use a different naming convention than the LLVM/libgcc/compiler-rt scheme, so linking to them should be easy. For example, instead of __muldi3 it provides _allmul for multiplying two i64 together.

@japaric yeah as @Amanieu mentioned I don't think it's possible to do that. The testing for this crate is likely to basically just be a very flavorful binary. In essence though we should avoid linking compiler-builtins and instead link libgcc instead. Then all the symbols in the compiler-builtins crate don't have their real symbol name, but rather a #[cfg]'d variant.

I believe everything is essentially done on this issue upstream at https://github.com/rust-lang-nursery/compiler-builtins now. The README also contains a checklist for in-progress translations, so I'm going to close this.