iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`malloc(): corrupted top size` in `iree-compile` and `iree-opt` when building with existing LLVM

makslevental opened this issue · comments

What happened?

I am experimenting with using pre-baked distros of LLVM to build IREE (mlir-wheels) and I'm hitting the weirdest heap/malloc/something error:

(iree) mlevental@mlevental-CORSAIR-ONE-PRO-a200:/tmp/iree/build/tools$ ./iree-opt 
malloc(): corrupted top size
Aborted (core dumped)

and

(iree) mlevental@mlevental-CORSAIR-ONE-PRO-a200:/tmp/iree/build/tools$ ./iree-compile 
malloc(): corrupted top size
Aborted (core dumped)

and only those two. And this only happens if I build IREE in release mode. gdbing the release binaries I get

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737353580288, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fffeea42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fffeea287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffeea89676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fffeebdbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007fffeeaa0cfc in malloc_printerr (str=str@entry=0x7fffeebd97ec "malloc(): corrupted top size") at ./malloc/malloc.c:5664
#7  0x00007fffeeaa46f2 in _int_malloc (av=av@entry=0x7fffeec1ac80 <main_arena>, bytes=bytes@entry=8) at ./malloc/malloc.c:4373
#8  0x00007fffeeaa5262 in __GI___libc_malloc (bytes=8) at ./malloc/malloc.c:3321
#9  0x00007fffee6ae98c in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff087e026 in void std::vector<mlir::Pass::Statistic*, std::allocator<mlir::Pass::Statistic*> >::_M_realloc_insert<mlir::Pass::Statistic*>(__gnu_cxx::__normal_iterator<mlir::Pass::Statistic**, std::vector<mlir::Pass::Statistic*, std::allocator<mlir::Pass::Statistic*> > >, mlir::Pass::Statistic*&&) () from /tmp/iree/build/lib/libIREECompiler.so
#11 0x0000000000000000 in ?? ()

which gives me only Pass::Statistic to go off. Luckily this utility is not used anywhere upstream and in only one place in IREE: compiler/src/iree/compiler/Dialect/Util/Transforms/FoldGlobals.cpp#L523-L526. Commenting out the requisite stuff resolves the heap/malloc/error. Reading around llvm/ADT/Statistic.h#L38 I find

#if !defined(NDEBUG) || LLVM_FORCE_ENABLE_STATS
#define LLVM_ENABLE_STATS 1
#else
#define LLVM_ENABLE_STATS 0
#endif

Which might explain why no when building in debug mode? But not really because the heap error is from here I think. So I'm going to try building my wheels with LLVM_FORCE_ENABLE_STATS=ON and see whether that resolves the issue but even if it does, it won't explain it.

Before anyone recommends asan: I had to disable compiler-rt because I haven't yet figured out how to cross compile it (and the wheels are cross-compiled for aarch64/arm64) so no go there currently :(

So I'm going to try building my wheels with LLVM_FORCE_ENABLE_STATS=ON and see whether that resolves the issue but even if it does, it won't explain it.

Indeed this worked but I don't know why.

Mismatched ndebug. LLVM does not have compatible class layouts between ndebug and not.

Mismatched ndebug. LLVM does not have compatible class layouts between ndebug and not.

So when I compile LLVM with LLVM_ENABLE_ASSERTIONS=ON I get NDEBUG un-defined? And thus in IREE I have to do IREE_ENABLE_ASSERTIONS=ON as well?

Yes, the -DNDEBUG compiler flags must match and that cmake setting controls it

Mystery solved. Good work team.