rust-lang / rust

Empowering everyone to build reliable and efficient software.

Home Page:https://www.rust-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regression on nightly since LLVM 8 upgrade: `thread` sanitizer doesn't compile anymore

PaulGrandperrin opened this issue · comments

Hi, the fuzzer I maintain is failing to build on the latest nightlies:

The interesting part of the error log seems to be:

note: /usr/bin/ld: __sancov_guards has both ordered [`__sancov_guards' in /home/travis/build/rust-fuzz/honggfuzz-rs/example/hfuzz_target/x86_64-unknown-linux-gnu/release/deps/example-934b6185f6a63e31.example.ajs5tmgw-cgu.0.rcgu.o] and unordered [`__sancov_guards' in /home/travis/build/rust-fuzz/honggfuzz-rs/example/hfuzz_target/x86_64-unknown-linux-gnu/release/deps/example-934b6185f6a63e31.example.ajs5tmgw-cgu.0.rcgu.o] sections
          /usr/bin/ld: final link failed: Bad value
          collect2: error: ld returned 1 exit status

You can find the full log here:
https://travis-ci.org/rust-fuzz/honggfuzz-rs/jobs/424079778

I bisected on my computer the exact rust version that fails and it seems to be related to the LLVM 8 upgrade.

This version works well:

# rustup default nightly-2018-09-01
# rustc -vV
rustc 1.30.0-nightly (aaa170beb 2018-08-31)
binary: rustc
commit-hash: aaa170bebe31d03e2eea14e8cb06dc2e8891216b
commit-date: 2018-08-31
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 7.0

This version doesn't:

# rustup default nightly-2018-09-02
# rustc -vV                                                                                                                             Tue 04 Sep 2018 02:25:07 PM CEST
rustc 1.30.0-nightly (28bcffead 2018-09-01)
binary: rustc
commit-hash: 28bcffead74d5e17c6cb1f7de432e37f93a6b50c
commit-date: 2018-09-01
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 8.0

I have the same issue.

I progressed a little bit on narrowing down the root cause of the issue.
It's only triggered when using the thread sanitizer.
How to reproduce:

cd /tmp
git clone https://github.com/rust-fuzz/honggfuzz-rs.git
cd honggfuzz-rs/example/
RUSTFLAGS="-Z sanitizer=thread" ./test.sh

If you use the address or leak sanitizer or no sanitizers, there is no issues.

(The regression is nightly-to-nightly and recent, the label must've been an accident)

I am facing the same issue with address sanitizer on rustc 1.30.0-nightly (f2302daef 2018-09-12) when building with libfuzzer (cargo-fuzz)

I'd like to work on this.

Some initial findings:

That error is emited by ld when(I think) a section links to both both unordered and order sections. Ordered sections are defined by the presence of the SHF_LINK_ORDER ELF section header flag, which is described here

LLVM emits this flag in TargetLoweringObjectFileImpl.cpp here and here, in response to LLVMContext::MD_associated being set.

From what I can see, LLVMContext::MD_associated is unconditionally set by SanitizerCoverage when writing to the __sancov_guards section.

I'll need to investigate further to determine how this flag is getting left off.

I've determine that passing -C opt-level=0 causes the compilation to succeed, while passing -C opt-level=1 causes it to fail.

I suspect that this issue is caused by an interaction between LLVM's Dead Global Elimination pass (which doesn't run with opt-level=0) and the sanitizer. My guess is that LLVM ends up deleting an unused function referenced by MD_ASSOCIATED. This would leave FunctionGuardArray with a dangling reference to its function, causing getAssociatedSymbol to return null.

In this case, LLVM would no longer add the SHF_LINK_ORDER flag to the ELF section, resulting in a linker error due to the missing flag.

However, this is all still somewhat speculative. I'm going to try to come up with a minimal reproduction, which can hopefully be induced to fail/succeed by toggling the Dead Global Elimination pass.

TL;DR: As as a temporary workaround, pass -C opt-level=0. This issue is caused by an LLVM bug, so it will need to be fixed upstream.

I've now determined that this is definitely an LLVM bug. I've created a minimal reproduction, which only uses Clang and other LLVM tools, here: https://github.com/Aaron1011/llvm_arg_elim

The issue occurs due to the behavior of LLVM's DeadArgumentEliminationPass (not Dead Global Elimination, as I had previously thought). When DeadArgumentEliminationPass removes arguments/return values from a function, it actually creates an entirely new function, and updates all references to the previous function. However, it fails to update any MD_associated metadata entries
targeting the old function.

As I described in my previous comment, this results in LLVM leaving off the SHF_LINK_ORDER flag when generating the ELF section header. Since there are still other __sancov_guarc sections with the header present (from functions that DeadArgumentEliminationPass didn't modify), ld will error when it sees the mismatched flags.

I'll be filing a bug with LLVM once I'm given an account on their bugtracker. For now, you can work around this issue by passing -C opt-level=0 to rustc. This will disable running LLVM optimizations, including DeadArgumentEliminationPass. Unfortunately, there doesn't seem to be a way to disable that particular pass, other than by disabling all optimizations.

Using the gold linker with -Clink-arg=-fuse-ld=gold seems to avoid this problem entirely.

When using the default (BFD) linker, the 'has both ordered and unordered' error appears to be triggered by two separate bugs:

  1. The LLVM DeadArgumentEliminationPass bug, which I'm still planning to upstream a fix for.
  2. The Dead Global Elimiation interaction that I mentioend here. I'm not sure if this is actually an LLVM bug - the existance of an MD_Associated global shouldn't prevent a function from being deleted, but there's no good way to delete the __sancov_gen global entirely. Since golddoesn't complain about SHF_LINK_ORDER being used inconsistently, I'm not sure if this is a real issue or not.

I've managed to come up with a full fix locally. I'll be submitting my changes to LLVM tomorrow, and will post the Phabricator link(s) here once I do so.

The cause of the issue:

  1. Several LLVM passes (ArgumentPromotion, DeadArgumentElimination, Inliner, GlobalDCE, GlobalOpt, Internalize, and possibly others) mishandle COMDATs and/or MD_Associated metadata - either through improper deletion, or failure to properly update.
  2. This mishandling can result in two kinds of malformed __sancov_guards sections:
    1. The associated function is stripped from the object, but the __sancov_gen_ symbol associated with it is still emitted in a __sancov_guards section. Since the associated function section does not exist in the object, the __sancov_guards section will have nothing to link to. This is due to LLVM failing to take COMDATs into account in several places when deleting dead code/objects.
    2. The associated function still exists in the object, but the __sancov_guards section is not linked to it. This is due to several LLVM passes accidentally removing the MD_Associated metadata from the __sancov_gen_ global object.

In both of these cases, the BFD linker will see proper, 'ordered' __sancov_guards sections (sh_link is set and the SHF_LINK_ORDER flag is set) in addition to an improper, unordered __sancov_guards section (which LLVM failed to link to its associated function).

Nice! Might be worth pushing them to our llvm fork so we can pick them up more quickly?

I think it might be best to wait until they're (hopefully) all accepted by LLVM. Getting them into the rust LLVM fork is going to require cherry-picking some additional commits, and it's possible that the LLVM team might want some changes before my patches are merged.

Thank for all your work on this @Aaron1011. Any news?

This might be related to the linker versions. I.e. something that fails to build on travis builds successfully on my local machine with a super recent toolchain.

That's some great detective work @Aaron1011! Did you ever hear anything more back about those patches?

I would suggest raising the urgency on this, because the ld.gold workaround doesn't work here anymore.

Relevant versions:

ld.gold --version
GNU gold (GNU Binutils for Ubuntu 2.34) 1.16

apt policy binutils
Installed: 2.34-5ubuntu1
(running the soon to be released Ubuntu 20.04)

rustc --version
rustc 1.42.0 (b8cedc004 2020-03-09)

Both honggfuzz-rs and bolero (all backends: libfuser, honggfuzz, afl) now get linker errors.

Using the gold linker with -Clink-arg=-fuse-ld=gold seems to avoid this problem entirely.

When using the default (BFD) linker, the 'has both ordered and unordered' error appears to be triggered by two separate bugs:

1. The LLVM `DeadArgumentEliminationPass` bug, which I'm still planning to upstream a fix for.

2. The `Dead Global Elimiation` interaction that I mentioend [here](https://github.com/rust-lang/rust/issues/53945#issuecomment-425620542). I'm not sure if this is actually an LLVM bug - the existance of an `MD_Associated` global shouldn't prevent a function from being deleted, but there's no good way to delete the `__sancov_gen` global entirely. Since `gold`doesn't complain about `SHF_LINK_ORDER` being used inconsistently, I'm not sure if this is a real issue or not.

@rustbot label A-sanitizers