Segfault

Question

Segfault

jeromegn opened this issue 4 years ago · comments

Jerome Gravel-Niquet commented 4 years ago

Not a very descriptive title, but I'm not sure what's causing the segfault.

(This might be a mimalloc issue)

Here's a stacktrace, using mimalloc 0.1.19:

Thread 1 (LWP 26715):
#0  mi_stat_update (stat=0x1b0, amount=1) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/stats.c:40
No locals.
#1  0x000055ae037394ad in _mi_malloc_generic (heap=0x55ae03b0d3a0 <_mi_heap_empty>, size=40) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/page.c:784
        page = <optimized out>
        req_size = <optimized out>
#2  0x000055ae03731553 in mi_heap_malloc (size=<optimized out>, heap=<optimized out>) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/alloc.c:89
        p = <optimized out>
        p = <optimized out>
        p = <optimized out>
#3  mi_malloc (size=<optimized out>) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/alloc.c:102
No locals.
#4  0x000055ae036f023f in alloc::alloc::alloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:80
No locals.
#5  <alloc::alloc::Global as core::alloc::AllocRef>::alloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:174
No locals.
#6  alloc::alloc::exchange_malloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:268
No locals.
#7  std::sync::mutex::Mutex<T>::new () at src/libstd/sync/mutex.rs:168
No locals.
#8  std::thread::Thread::new () at src/libstd/thread/mod.rs:1145
No locals.
#9  0x000055ae03705cc0 in std::sys_common::thread_info::ThreadInfo::with::{{closure}} () at src/libstd/sys_common/thread_info.rs:23
No locals.
#10 std::thread::local::LocalKey<T>::try_with () at src/libstd/thread/local.rs:263
No locals.
#11 std::sys_common::thread_info::ThreadInfo::with () at src/libstd/sys_common/thread_info.rs:19
No locals.
#12 std::sys_common::thread_info::stack_guard () at src/libstd/sys_common/thread_info.rs:36
No locals.
#13 std::sys::unix::stack_overflow::imp::signal_handler () at src/libstd/sys/unix/stack_overflow.rs:98
No locals.
#14 0x00007f61ed29f890 in ?? ()
No symbol table info available.
#15 0x0000000000000007 in ?? ()
No symbol table info available.
#16 0x0000000000000000 in ?? ()
No symbol table info available.

I'm not sure when it's happening either.

I can probably find more information if you point me at what you'd need :)

Octavian · Answer 1 · Tue Jun 09 2020 19:35:23 GMT+0800 (China Standard Time)

@jeromegn This is most definitely a mimalloc issue. Could you provide a few more details, such as your OS, build environment, and what code is causing this (or does it happen randomly in a huge app)? Also, does it happen once, always, multiple random times, etc?

This is probably better worth posting on the mimalloc issue board.

Jerome Gravel-Niquet · Answer 2 · Tue Jun 09 2020 19:43:50 GMT+0800 (China Standard Time)

This is on Ubuntu 18.04, kernel 5.0.0 64-bit, release mode with debug symbols. Built in a docker container using the rust 1.44.0 official docker image.

This is a large app and it appears to happen randomly :/

I can post this in the mimalloc repo's issues. I was initially wondering if maybe the segfault was caused by a misuse of mimalloc in this crate :)

Jerome Gravel-Niquet · Answer 3 · Tue Jun 09 2020 19:52:39 GMT+0800 (China Standard Time)

The "pinned" commit of mimalloc in mimalloc-sys is a bit old. I wonder if this might have been fixed since? For instance, this issue about a segfault (doesn't seem related) has been fixed ~Apr 6: microsoft/mimalloc#221

Octavian · Answer 4 · Tue Jun 09 2020 21:11:52 GMT+0800 (China Standard Time)

@jeromegn Just pulled the latest changes from the mimalloc upstream on master. Could you try testing with the latest version in the master branch?

Jerome Gravel-Niquet · Answer 5 · Tue Jun 09 2020 22:30:23 GMT+0800 (China Standard Time)

Thanks! I've rolled it out and will be monitoring for segfaults.

Jerome Gravel-Niquet · Answer 6 · Wed Jun 10 2020 05:20:29 GMT+0800 (China Standard Time)

Segfaults started happening again even with the latest version. Updating the issue in msft's repo.

Vincent Rouillé · Answer 7 · Sat Jun 20 2020 00:27:04 GMT+0800 (China Standard Time)

In the stack trace, std::sys::unix::stack_overflow::imp::signal_handler() suggest you are hitting a stack overflow before the mimalloc issue.

    // Signal handler for the SIGSEGV and SIGBUS handlers. We've got guard pages
    // (unmapped pages) at the end of every thread's stack, so if a thread ends
    // up running into the guard page it'll trigger this handler. We want to
    // detect these cases and print out a helpful error saying that the stack
    // has overflowed. All other signals, however, should go back to what they
    // were originally supposed to do.
    //
    // This handler currently exists purely to print an informative message
    // whenever a thread overflows its stack. We then abort to exit and
    // indicate a crash, but to avoid a misleading SIGSEGV that might lead
    // users to believe that unsafe code has accessed an invalid pointer; the
    // SIGSEGV encountered when overflowing the stack is expected and
    // well-defined.
    //
    // If this is not a stack overflow, the handler un-registers itself and
    // then returns (to allow the original signal to be delivered again).
    // Returning from this kind of signal handler is technically not defined
    // to work when reading the POSIX spec strictly, but in practice it turns
    // out many large systems and all implementations allow returning from a
    // signal handler to work. For a more detailed explanation see the
    // comments on #26458.
    unsafe extern "C" fn signal_handler(

Jerome Gravel-Niquet · Answer 8 · Sat Jun 20 2020 19:43:19 GMT+0800 (China Standard Time)

Thanks @Speedy37. if I understand correctly, I think that's consistent with the resolution of microsoft/mimalloc#257 ?

Vincent Rouillé · Answer 9 · Sat Jun 20 2020 21:13:28 GMT+0800 (China Standard Time)

It probably is. It would be great to verify if mi_malloc secure guard page os limitation triggers that signal_handler.

Jerome Gravel-Niquet · Answer 10 · Wed Aug 19 2020 19:40:54 GMT+0800 (China Standard Time)

FWIW, I'm still using the dev branch for this fix. I think they finally merged the fix upstream? Maybe mimalloc could be updated in the main branch in this repo?