root-project / cling

The cling C++ interpreter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLVM 13 setDefaultOptLevel segfault

jeaye opened this issue · comments

Overview

With the new LLVM 13 branches, any usage of cling::Interpreter::setDefaultOptLevel with a value greater than 0 results in a segfault.

Sample code

Modifying the cling-demo to look like the following reproduces the issue for me.

int main(int const argc, char const **argv)
{
  std::array<char const*, 2> cling_args{ argv[0], "-std=c++17" };
  cling::Interpreter jit(cling_args.size(), cling_args.data(), LLVMDIR);
  jit.setDefaultOptLevel(1); // works when 0, crashes when > 0

  jit.process("#include <iostream>");
  jit.process("struct foo { int a{}; };");
  jit.process("foo f;"); // crashes here
  jit.process("std::cout << f.a << std::endl;");
}

Backtrace from crash

#0  0x0000555556610700 in clang::CodeGen::CodeGenTBAA::getBaseTypeInfoHelper(clang::Type const*) ()
#1  0x000055555660f655 in clang::CodeGen::CodeGenTBAA::getTypeInfo(clang::QualType) ()
#2  0x000055555660fae6 in clang::CodeGen::CodeGenTBAA::getAccessInfo(clang::QualType) ()
#3  0x00005555565a1af4 in clang::CodeGen::CodeGenModule::getTBAAAccessInfo(clang::QualType) ()
#4  0x000055555678046e in clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(clang::DeclRefExpr const*) ()
#5  0x0000555556775f1d in clang::CodeGen::CodeGenFunction::EmitLValue(clang::Expr const*) ()
#6  0x0000555556775962 in clang::CodeGen::CodeGenFunction::EmitIgnoredExpr(clang::Expr const*) ()
#7  0x000055555653a0f0 in clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) ()
#8  0x00005555565456f0 in clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) ()
#9  0x00005555565939c2 in clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const*) ()
#10 0x0000555556594436 in clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) ()
#11 0x00005555565b0258 in clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) ()
#12 0x00005555565a9e22 in clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) ()
#13 0x00005555565ad598 in clang::CodeGen::CodeGenModule::EmitGlobal(clang::GlobalDecl) ()
#14 0x00005555565b37d5 in clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) ()
#15 0x00005555565039bf in clang::CodeGeneratorImpl::HandleTopLevelDecl(clang::DeclGroupRef) ()
#16 0x00005555564dc5cc in cling::DeclCollector::HandleTopLevelDecl(clang::DeclGroupRef) ()
#17 0x0000555556463b4d in cling::IncrementalParser::ParseInternal(llvm::StringRef) ()
#18 0x0000555556464fa4 in cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) ()
#19 0x0000555556393316 in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) ()
#20 0x000055555639288b in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) ()
#21 0x000055555638d59f in main ()

System details

Arch Linux, Cling built manually.

llvm_url="http://root.cern.ch/git/llvm.git"
llvm_branch="cling-patches-rrelease_13"
clang_url="http://root.cern.ch/git/clang.git"
clang_branch="cling-patches-rrelease_13"
cling_url="http://root.cern.ch/git/cling.git"
cling_branch="master" # commit da247bd77a92f0793abe95e10b373dbca7a7e5f1

Hi! Following up on this since it's still an issue for me. Is there more info I can provide to help diagnose the problem?

@vgvassilev Any chance I could get someone to look at this? It's been preventing me from using LLVM 13 for almost 6 months.

@hahnjo do you have any clue what’s happening here? If that’s a non-issue for root then we might have forgotten some patch?

Yes indeed, ROOT-patches in the "old" https://github.com/vgvassilev/clang/ repository very likely misses vgvassilev/clang@79c42f7. I folded this into cling-llvm13 in the monorepo here: https://github.com/root-project/llvm-project/releases/tag/cling-llvm13-20230111-02

My recommendation would obviously be to use the monorepo fork going forward 😉

Oh, I had no idea there's a monorepo now. Looks like it doesn't contain cling.

It's still relatively "new", we are still transitioning over. It only contains upstream https://github.com/llvm/llvm-project plus our patches to those components. Cling remains separate for now.

Would you mind sharing the correct repo and branch for llvm13 cling, too? Also, is there an up to date example of building these two with cmake?

The repo and branch are https://github.com/root-project/llvm-project/tree/cling-llvm13. You need to clone that and this repo, https://github.com/root-project/cling. Then you configure it with cmake -DLLVM_ENABLE_PROJECTS=clang -DLLVM_EXTERNAL_PROJECTS=cling -DLLVM_EXTERNAL_CLING_SOURCE_DIR=/path/to/cling, with the latter two parameters virtually stitching together the two repos. Please try and let me know how it goes.

Lastly, do you suspect that this has also been fixed? #484 It's another blocker for me using cling+llvm 13.

Hm, possible but doesn't ring a bell immediately. I will try to follow up on that issue...

The repo and branch are https://github.com/root-project/llvm-project/tree/cling-llvm13. You need to clone that and this repo, https://github.com/root-project/cling. Then you configure it with cmake -DLLVM_ENABLE_PROJECTS=clang -DLLVM_EXTERNAL_PROJECTS=cling -DLLVM_EXTERNAL_CLING_SOURCE_DIR=/path/to/cling, with the latter two parameters virtually stitching together the two repos. Please try and let me know how it goes.

I've given this a go. I've built cling/clang/llvm successfully, using the monorepo and cling Github repo (as opposed to the old CERN repos). I can run the compiled cling binary with no issue. Details are:

cling_url="https://github.com/root-project/cling.git"
cling_branch="master" # on commit acb2334131c19ef506767d6d9051b24755a8566b
llvm_url="https://github.com/root-project/llvm-project.git"
llvm_branch="cling-llvm13"

However, when I embed it into my existing cling application, which works with cling 9 (with and without optimizations) and cling 13 (only without optimizations), I get linker errors at run-time when JIT compiling. The linker errors are for tcmalloc/jemalloc, but I'm using neither. In fact, I'm using the Boehm GC (hard requirement), so I don't think I can use either.

IncrementalExecutor::executeFunction: symbol 'MallocExtension_Internal_GetNumericProperty' unresolved while linking [cling interface function]!
IncrementalExecutor::executeFunction: symbol 'nallocx' unresolved while linking [cling interface function]!
Symbol found in '/usr/lib/libtcmalloc_minimal.so.4.5.10'; did you mean to load it with '.L /usr/lib/libtcmalloc_minimal.so.4.5.10'?
IncrementalExecutor::executeFunction: symbol 'sdallocx' unresolved while linking [cling interface function]!
Symbol found in '/usr/lib/libjemalloc.so.2'; did you mean to load it with '.L /usr/lib/libjemalloc.so.2'?

Is there a new dependency here? I've tried adding tcmalloc and jemalloc to the link libraries (separately) and the JIT errors go away, but it hard crashes when trying to free. Backtrace included here (I've also tried with LD_PRELOAD):

#0  0x00000000006d9819 in GC_free ()
#1  0x00000000006cd736 in operator delete(void*) ()
#2  0x00007ffff43471ac in ?? () from /home/jeaye/projects/jank/build/libjankcling.so
#3  0x00007ffff4346e71 in clang::driver::Distro::Distro(llvm::vfs::FileSystem&, llvm::Triple const&) ()
   from /home/jeaye/projects/jank/build/libjankcling.so
#4  0x00007ffff42a2e93 in clang::driver::CudaInstallationDetector::CudaInstallationDetector(clang::driver::Driver const&, llvm::Triple const&, llvm::opt::ArgList const&) () from /home/jeaye/projects/jank/build/libjankcling.so
#5  0x00007ffff42f078c in ?? () from /home/jeaye/projects/jank/build/libjankcling.so
#6  0x00007ffff4304c05 in ?? () from /home/jeaye/projects/jank/build/libjankcling.so
#7  0x00007ffff41ec2f8 in clang::driver::Driver::getToolChain(llvm::opt::ArgList const&, llvm::Triple const&) const ()
   from /home/jeaye/projects/jank/build/libjankcling.so
#8  0x00007ffff41f2051 in clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) ()
   from /home/jeaye/projects/jank/build/libjankcling.so
#9  0x00007ffff3bc7032 in ?? () from /home/jeaye/projects/jank/build/libjankcling.so
#10 0x00007ffff3bc3858 in cling::CIFactory::createCI(llvm::StringRef, cling::InvocationOptions const&, char const*, std::unique_ptr<clang::ASTConsumer, std::default_delete<clang::ASTConsumer> >, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&) () from /home/jeaye/projects/jank/build/libjankcling.so
#11 0x00007ffff3c82979 in cling::IncrementalParser::IncrementalParser(cling::Interpreter*, char const*, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&) ()
   from /home/jeaye/projects/jank/build/libjankcling.so
#12 0x00007ffff3c8b533 in cling::Interpreter::Interpreter(int, char const* const*, char const*, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&, void*, bool, cling::Interpreter const*) ()
   from /home/jeaye/projects/jank/build/libjankcling.so
#13 0x000000000066cd6a in cling::Interpreter::Interpreter(int, char const* const*, char const*, std::vector<std::shared_ptr<clang::ModuleFileExtension>, std::allocator<std::shared_ptr<clang::ModuleFileExtension> > > const&, void*, bool) ()
#14 0x000000000066bdfc in std::__detail::_MakeUniq<cling::Interpreter>::__single_object std::make_unique<cling::Interpreter, unsigned long, char const* const*, char const*>(unsigned long&&, char const* const*&&, char const*&&) ()
#15 0x000000000066b1b2 in jank::jit::processor::processor() ()
#16 0x000000000058841a in main ()

In case it's useful, I configured LLVM like so:

  cmake -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_TARGETS_TO_BUILD="host;NVPTX" \
        -DLLVM_BUILD_LLVM_DYLIB=OFF \
        -DLLVM_ENABLE_RTTI=ON \
        -DLLVM_ENABLE_FFI=ON \
        -DLLVM_BUILD_DOCS=OFF \
        -DLLVM_ENABLE_SPHINX=OFF \
        -DLLVM_ENABLE_DOXYGEN=OFF \
        -DLLVM_ENABLE_LIBCXX=OFF \
        -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" \
        -DLLVM_EXTERNAL_PROJECTS=cling \
        -DLLVM_EXTERNAL_CLING_SOURCE_DIR="${srcdir}/cling" \
        -DFFI_INCLUDE_DIR="${ffi_include_dir}" \
        -DCLING_CXX_HEADERS=ON \
        "${srcdir}/llvm/llvm"

Looks like the tcmalloc/jemalloc dep issue came in as part of dependency hell. The clang binary from the monorepo version of cling causes vcpkg build failures with folly, so I had to upgrade vcpkg to get the latest folly. That builds, but the latest folly seems to cause the malloc linker issues in cling, even though it's been using jemalloc this entire time. I couldn't find a way to get it to work, so I've torn out folly for now. It's a shame, since its synchronization classes are the fastest around, but I was more interested in seeing if I could get monorepo cling going.

The good news is, after ripping out folly, I can use the monorepo cling AND it works with optimizations enabled. So this means I'm not seeing this issue or #484. However, not being able to use folly means I've a big performance setback and the whole reason this started was to get on llvm 13 with optimizations to improve my benchmarks compared to llvm 9.

To be clear, folly was not the reason I couldn't use optimizations on cling+llvm 13 before, with the old repos. It's only become an issue with the new monorepo cling.

If there's any advice for what's going on with the malloc link issues, I'm all ears.


UPDATE: Looks like the issue is that folly switched to weak symbols for jemalloc: facebook/folly@4c1964d

Yet, when I link it myself or LD_PRELOAD it, I hard crash with the same backtrace shown above. Still stuck, but I have a better idea of why now. The relevant code being this header and source: https://github.com/facebook/folly/blob/93b2aea533e3daa924ba5cbf3d6fa0e385d02e73/folly/memory/detail/MallocImpl.h

After forking vcpkg to patch folly into no longer using weak symbols for jemalloc, all systems are green. Cling + LLVM 13 is working with all optimizations. Using the monorepo has solved a lot of problems. Thanks so much for getting back to me on this.

I think the one question I have remaining is if any of you have tried using UBSan with Cling. I gave it a go an ran into a bunch of issues and I figured I'd check before trying to persist through them more than I have.

I think the one question I have remaining is if any of you have tried using UBSan with Cling.

You mean for "interpreted" code compiled via Cling? Not that I'm aware of... In the past, I made ASan and TSan work by dumping out all accumulated AST into a binary and then just running that in a separate process. Not sure if that would be an option for you...