`malloc(): corrupted top size` in `iree-compile` and `iree-opt` when building with existing LLVM
makslevental opened this issue · comments
What happened?
I am experimenting with using pre-baked distros of LLVM to build IREE (mlir-wheels) and I'm hitting the weirdest heap/malloc/something error:
(iree) mlevental@mlevental-CORSAIR-ONE-PRO-a200:/tmp/iree/build/tools$ ./iree-opt
malloc(): corrupted top size
Aborted (core dumped)
and
(iree) mlevental@mlevental-CORSAIR-ONE-PRO-a200:/tmp/iree/build/tools$ ./iree-compile
malloc(): corrupted top size
Aborted (core dumped)
and only those two. And this only happens if I build IREE in release mode. gdb
ing the release binaries I get
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737353580288) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737353580288, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007fffeea42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007fffeea287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007fffeea89676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fffeebdbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6 0x00007fffeeaa0cfc in malloc_printerr (str=str@entry=0x7fffeebd97ec "malloc(): corrupted top size") at ./malloc/malloc.c:5664
#7 0x00007fffeeaa46f2 in _int_malloc (av=av@entry=0x7fffeec1ac80 <main_arena>, bytes=bytes@entry=8) at ./malloc/malloc.c:4373
#8 0x00007fffeeaa5262 in __GI___libc_malloc (bytes=8) at ./malloc/malloc.c:3321
#9 0x00007fffee6ae98c in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff087e026 in void std::vector<mlir::Pass::Statistic*, std::allocator<mlir::Pass::Statistic*> >::_M_realloc_insert<mlir::Pass::Statistic*>(__gnu_cxx::__normal_iterator<mlir::Pass::Statistic**, std::vector<mlir::Pass::Statistic*, std::allocator<mlir::Pass::Statistic*> > >, mlir::Pass::Statistic*&&) () from /tmp/iree/build/lib/libIREECompiler.so
#11 0x0000000000000000 in ?? ()
which gives me only Pass::Statistic
to go off. Luckily this utility is not used anywhere upstream and in only one place in IREE: compiler/src/iree/compiler/Dialect/Util/Transforms/FoldGlobals.cpp#L523-L526. Commenting out the requisite stuff resolves the heap/malloc/error. Reading around llvm/ADT/Statistic.h#L38 I find
#if !defined(NDEBUG) || LLVM_FORCE_ENABLE_STATS
#define LLVM_ENABLE_STATS 1
#else
#define LLVM_ENABLE_STATS 0
#endif
Which might explain why no when building in debug mode? But not really because the heap error is from here I think. So I'm going to try building my wheels with LLVM_FORCE_ENABLE_STATS=ON
and see whether that resolves the issue but even if it does, it won't explain it.
Before anyone recommends asan: I had to disable compiler-rt because I haven't yet figured out how to cross compile it (and the wheels are cross-compiled for aarch64/arm64) so no go there currently :(
So I'm going to try building my wheels with LLVM_FORCE_ENABLE_STATS=ON and see whether that resolves the issue but even if it does, it won't explain it.
Indeed this worked but I don't know why.
Mismatched ndebug. LLVM does not have compatible class layouts between ndebug and not.
Mismatched ndebug. LLVM does not have compatible class layouts between ndebug and not.
So when I compile LLVM with LLVM_ENABLE_ASSERTIONS=ON
I get NDEBUG un-defined? And thus in IREE I have to do IREE_ENABLE_ASSERTIONS=ON
as well?
Yes, the -DNDEBUG compiler flags must match and that cmake setting controls it
Mystery solved. Good work team.