purpleprotocol / mimalloc_rust

A Rust wrapper over Microsoft's MiMalloc memory allocator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segfault

jeromegn opened this issue · comments

Not a very descriptive title, but I'm not sure what's causing the segfault.

(This might be a mimalloc issue)

Here's a stacktrace, using mimalloc 0.1.19:

Thread 1 (LWP 26715):
#0  mi_stat_update (stat=0x1b0, amount=1) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/stats.c:40
No locals.
#1  0x000055ae037394ad in _mi_malloc_generic (heap=0x55ae03b0d3a0 <_mi_heap_empty>, size=40) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/page.c:784
        page = <optimized out>
        req_size = <optimized out>
#2  0x000055ae03731553 in mi_heap_malloc (size=<optimized out>, heap=<optimized out>) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/alloc.c:89
        p = <optimized out>
        p = <optimized out>
        p = <optimized out>
#3  mi_malloc (size=<optimized out>) at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libmimalloc-sys-0.1.15/c_src/mimalloc/src/alloc.c:102
No locals.
#4  0x000055ae036f023f in alloc::alloc::alloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:80
No locals.
#5  <alloc::alloc::Global as core::alloc::AllocRef>::alloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:174
No locals.
#6  alloc::alloc::exchange_malloc () at /rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/alloc.rs:268
No locals.
#7  std::sync::mutex::Mutex<T>::new () at src/libstd/sync/mutex.rs:168
No locals.
#8  std::thread::Thread::new () at src/libstd/thread/mod.rs:1145
No locals.
#9  0x000055ae03705cc0 in std::sys_common::thread_info::ThreadInfo::with::{{closure}} () at src/libstd/sys_common/thread_info.rs:23
No locals.
#10 std::thread::local::LocalKey<T>::try_with () at src/libstd/thread/local.rs:263
No locals.
#11 std::sys_common::thread_info::ThreadInfo::with () at src/libstd/sys_common/thread_info.rs:19
No locals.
#12 std::sys_common::thread_info::stack_guard () at src/libstd/sys_common/thread_info.rs:36
No locals.
#13 std::sys::unix::stack_overflow::imp::signal_handler () at src/libstd/sys/unix/stack_overflow.rs:98
No locals.
#14 0x00007f61ed29f890 in ?? ()
No symbol table info available.
#15 0x0000000000000007 in ?? ()
No symbol table info available.
#16 0x0000000000000000 in ?? ()
No symbol table info available.

I'm not sure when it's happening either.

I can probably find more information if you point me at what you'd need :)

@jeromegn This is most definitely a mimalloc issue. Could you provide a few more details, such as your OS, build environment, and what code is causing this (or does it happen randomly in a huge app)? Also, does it happen once, always, multiple random times, etc?

This is probably better worth posting on the mimalloc issue board.

This is on Ubuntu 18.04, kernel 5.0.0 64-bit, release mode with debug symbols. Built in a docker container using the rust 1.44.0 official docker image.

This is a large app and it appears to happen randomly :/

I can post this in the mimalloc repo's issues. I was initially wondering if maybe the segfault was caused by a misuse of mimalloc in this crate :)

The "pinned" commit of mimalloc in mimalloc-sys is a bit old. I wonder if this might have been fixed since? For instance, this issue about a segfault (doesn't seem related) has been fixed ~Apr 6: microsoft/mimalloc#221

@jeromegn Just pulled the latest changes from the mimalloc upstream on master. Could you try testing with the latest version in the master branch?

Thanks! I've rolled it out and will be monitoring for segfaults.

Segfaults started happening again even with the latest version. Updating the issue in msft's repo.

In the stack trace, std::sys::unix::stack_overflow::imp::signal_handler() suggest you are hitting a stack overflow before the mimalloc issue.

    // Signal handler for the SIGSEGV and SIGBUS handlers. We've got guard pages
    // (unmapped pages) at the end of every thread's stack, so if a thread ends
    // up running into the guard page it'll trigger this handler. We want to
    // detect these cases and print out a helpful error saying that the stack
    // has overflowed. All other signals, however, should go back to what they
    // were originally supposed to do.
    //
    // This handler currently exists purely to print an informative message
    // whenever a thread overflows its stack. We then abort to exit and
    // indicate a crash, but to avoid a misleading SIGSEGV that might lead
    // users to believe that unsafe code has accessed an invalid pointer; the
    // SIGSEGV encountered when overflowing the stack is expected and
    // well-defined.
    //
    // If this is not a stack overflow, the handler un-registers itself and
    // then returns (to allow the original signal to be delivered again).
    // Returning from this kind of signal handler is technically not defined
    // to work when reading the POSIX spec strictly, but in practice it turns
    // out many large systems and all implementations allow returning from a
    // signal handler to work. For a more detailed explanation see the
    // comments on #26458.
    unsafe extern "C" fn signal_handler(

Thanks @Speedy37. if I understand correctly, I think that's consistent with the resolution of microsoft/mimalloc#257 ?

It probably is. It would be great to verify if mi_malloc secure guard page os limitation triggers that signal_handler.

FWIW, I'm still using the dev branch for this fix. I think they finally merged the fix upstream? Maybe mimalloc could be updated in the main branch in this repo?