google / tcmalloc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crashes when unwinding the stack from a signal handler interrupting deallocation

mbautin opened this issue · comments

After we upgraded YugabyteDB codebase from Gperftools tcmalloc to this version, we encountered the following type of crashes:

(lldb) target create "tests-util/debug-util-test" --core "core.92253"
Core file '/home/mbautin/code/yugabyte-db4/build/latest/core.92253' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'debug-util-test', stop reason = signal SIGSEGV
  * frame #0: 0x00007fd65a231acf libgcc_s.so.1`uw_frame_state_for + 1055
    frame #1: 0x00007fd65a233758 libgcc_s.so.1`_Unwind_Backtrace + 104
    frame #2: 0x00007fd65a577c56 libc.so.6`__backtrace + 102
    frame #3: 0x00007fd65c07c5a5 libyb_util.so`yb::StackTrace::Collect(this=0x00007fd653da4120, skip_frames=2) at debug-util.cc:433:17
    frame #4: 0x00007fd65c274385 libyb_util.so`yb::(anonymous namespace)::HandleStackTraceSignal(signum=12) at stack_trace.cc:183:15
    frame #5: 0x00007fd65a48ab20 libc.so.6`__restore_rt
    frame #6: 0x000055894739b5c8 debug-util-test`TcmallocSlab_Internal_PopBatch_trampoline
(lldb) bt
* thread #1, name = 'debug-util-test', stop reason = signal SIGSEGV
  * frame #0: 0x00007f8ce502aacf libgcc_s.so.1`uw_frame_state_for + 1055
    frame #1: 0x00007f8ce502c758 libgcc_s.so.1`_Unwind_Backtrace + 104
    frame #2: 0x00007f8ce5370c56 libc.so.6`__backtrace + 102
    frame #3: 0x00007f8ce6e755a5 libyb_util.so`yb::StackTrace::Collect(this=0x00007f8cdfb9f0e0, skip_frames=2) at debug-util.cc:433:17
    frame #4: 0x00007f8ce706d385 libyb_util.so`yb::(anonymous namespace)::HandleStackTraceSignal(signum=12) at stack_trace.cc:183:15
    frame #5: 0x00007f8ce5283b20 libc.so.6`__restore_rt
    frame #6: 0x000055f6fb79a40d debug-util-test`tcmalloc_internal_tls_fetch_pic + 77
    frame #7: 0x000055f6fb75639c debug-util-test`tcmalloc::tcmalloc_internal::cpu_cache_internal::CpuCache<tcmalloc::tcmalloc_internal::cpu_cache_internal::StaticForwarder>::Overflow(void*, unsigned long, int) + 252
    frame #8: 0x000055f6fb737c62 debug-util-test`operator delete(void*) + 1122
    frame #9: 0x000055f6fb6e712c debug-util-test`yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody(this=0x000024fd7eaa3650)::Entry::~Entry() at debug-util-test.cc:345:9
    frame #10: 0x000055f6fb6ebf75 debug-util-test`yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody(this=0x000024fd7fcab590)::$_0::operator()() const at debug-util-test.cc:385:11
    frame #11: 0x000055f6fb6ebd1f debug-util-test`void yb::TestThreadHolder::AddThreadFunctor<yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0>(this=0x000024fd7fcab588)::$_0 const&)::'lambda'()::operator()() const at test_thread_holder.h:62:7
    frame #12: 0x000055f6fb6ebcb5 debug-util-test`decltype(__f=0x000024fd7fcab588)::$_0>()()) std::__1::__invoke[abi:v160003]<void yb::TestThreadHolder::AddThreadFunctor<yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0>(yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0 const&)::'lambda'()>(yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0&&) at invoke.h:394:23
    frame #13: 0x000055f6fb6ebc8d debug-util-test`void std::__1::__thread_execute[abi:v160003]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void yb::TestThreadHolder::AddThreadFunctor<yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0>(yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0 const&)::'lambda'()>(__t=0x000024fd7fcab580, (null)=__tuple_indices<> @ 0x00007f8cdfba05a8)::$_0, void yb::TestThreadHolder::AddThreadFunctor<yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0>(yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0 const&)::'lambda'()>&, std::__1::__tuple_indices<>) at thread:282:5
    frame #14: 0x000055f6fb6ebab2 debug-util-test`void* std::__1::__thread_proxy[abi:v160003]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void yb::TestThreadHolder::AddThreadFunctor<yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0>(yb::DebugUtilTest_TestStackTraceSignalDuringAllocation_Test::TestBody()::$_0 const&)::'lambda'()>>(__vp=0x000024fd7fcab580) at thread:293:5
    frame #15: 0x00007f8ce56021cf libpthread.so.0`start_thread + 239
    frame #16: 0x00007f8ce526edd3 libc.so.6`__clone + 67

We have a stack trace dump facility that sends signal to threads and causes them to capture their stacks. This is being done using the backtrace Linux function that uses libunwind internally. We did not have any problems with this approach with Gperftools tcmalloc, but with this tcmalloc we are getting segmentation faults in case tcmalloc code is interrupted in functions such as tcmalloc_internal_tls_fetch_pic or TcmallocSlab_Internal_PopBatch_trampoline. We have a unit test that reliably reproduces this situation by creating a few threads that allocate objects and pass them to other threads for deallocation, while the main thread is repeatedly trying to dump the stacks of those worker threads.

As far as I know, libunwind backtrace facility is async-safe and is suitable for use in a signal handler. We are currently using LLVM 15's version of libunwind.

Has anyone else encountered this issue and is there a known workaround?

We are currently using this fork of tcmalloc: https://github.com/yugabyte/tcmalloc/tree/e116a66-yb (based on commit e116a66 with some build-related changes).