plasma-umass / Mesh

A memory allocator that automatically reduces the memory footprint of C/C++ applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Startup deadlock in mesh::freeSlowpath(void*)

filodej opened this issue · comments

Hi,
first of all I would like to thank you for a great piece of work and very promising idea.

I am trying to build and experiment with the mesh allocator, but unfortunately I am experiencing a deadlock on startup.

When I run for example this simple command line:
LD_PRELOAD=~/experiments/mesh/libmesh.so uname
... it ends in a deadlock with a following backtrace:

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
#1  0x00007ff641ea43ce in __cxxabiv1::__cxa_guard_acquire (g=0x7ff641f35318 <guard variable for mesh::runtime()::runtimePtr>)
    at /opt/conda/conda-bld/compilers_linux-64_1542882313995/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/guard.cc:306
#2  0x00007ff641ea33b0 in mesh::runtime() () at src/runtime.h:109
#3  0x00007ff641e9b1e1 in mesh::freeSlowpath(void*) [clone .lto_priv.66] (ptr=0x0) at src/libmesh.cc:69
#4  0x00007ff6419d203b in _IO_vfprintf_internal (s=<optimized out>, format=<optimized out>, ap=<optimized out>) at vfprintf.c:2065
#5  0x00007ff641a8ddf0 in ___vsnprintf_chk (s=0x7fff893e3d40 "/dev/shm/alloc-mesh-2746021.0d", maxlen=<optimized out>, flags=1,
    slen=<optimized out>, format=0x7ff641efd690 "%s/alloc-mesh-%d.%zud", args=0x7fff893e3bf0) at vsnprintf_chk.c:65
#6  0x00007ff641a8dd2a in ___snprintf_chk (s=<optimized out>, s@entry=0x7fff893e3d40 "/dev/shm/alloc-mesh-2746021.0d",
    maxlen=<optimized out>, maxlen@entry=127, flags=<optimized out>, flags@entry=1, slen=<optimized out>, slen@entry=128,
    format=<optimized out>, format@entry=0x7ff641efd690 "%s/alloc-mesh-%d.%zud") at snprintf_chk.c:36
#7  0x00007ff641ea3f24 in snprintf ()
    at ~/.conda/envs/local/x86_64-conda_cos6-linux-gnu/sysroot/usr/include/bits/stdio2.h:66
#8  mesh::MeshableArena::openSpanDir (this=<optimized out>, pid=2746021) at src/meshable_arena.cc:81
#9  mesh::MeshableArena::openShmSpanFile(unsigned long) [clone .constprop.37] (
    this=this@entry=0x7ff641f38340 <mesh::runtime()::buf>, sz=68719476736) at src/meshable_arena.cc:481
#10 0x00007ff641ea1ef6 in mesh::MeshableArena::openSpanFile () at src/meshable_arena.cc:539
#11 mesh::MeshableArena::__base_ctor (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/meshable_arena.cc:49
#12 mesh::GlobalHeap::__base_ctor (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/global_heap.h:88
#13 mesh::Runtime::Runtime() [clone .constprop.33] (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/runtime.cc:86
#14 0x00007ff641ea33b9 in mesh::runtime() () at src/runtime.h:109
#15 0x00007ff641ea3b94 in CreateThreadLocalHeap () at src/thread_local_heap.cc:18
#16 mesh::ThreadLocalHeap::GetHeap() () at src/thread_local_heap.cc:31
#17 0x00007ff641e9cca6 in mesh::callocSlowpath(unsigned long, unsigned long) [clone .lto_priv.74] (count=1, size=32) at src/libmesh.cc:80
#18 0x00007ff641366310 in _dlerror_run (operate=0x7ff6413660b0 <dlsym_doit>, args=0x7fff893e3f90) at dlerror.c:142
#19 0x00007ff64136607a in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:71
#20 0x00007ff641e99b3a in mesh::real::init() () at src/real.cc:41
#21 0x00007ff641e9577b in libmesh_init() [clone .lto_priv.50] () at src/libmesh.cc:14
#22 0x00007ff641e95ace in global constructors keyed to 65535_0_thread_local_heap.o.12673 ()
   from ~/experiments/mesh/libmesh.so
#23 0x00007ff641d309cf in call_init (env=0x7fff893e4128, argv=0x7fff893e4118, argc=1, l=<optimized out>) at dl-init.c:85
#24 _dl_init (main_map=0x7ff641f44190, argc=1, argv=0x7fff893e4118, env=0x7fff893e4128) at dl-init.c:134
#25 0x00007ff641d22b6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#26 0x0000000000000001 in ?? ()
#27 0x00007fff893e5870 in ?? ()
#28 0x0000000000000000 in ?? ()

The problem seems to be that during the mesh::runtime() call the Runtime::Runtime() constructor (more specifically it's MeshableArena::MeshableArena() base constructor) calls MeshableArena::openSpanDir(int pid), which formats a tmpDir string via the snprintf call. That function makes an internal allocation and deallocation via malloc/free calls:

args_value = args_malloced = malloc (nargs * bytes_per_arg);
...
all_done:
  if (specs_malloced)
    free (specs);
  free (args_malloced);

That in turn calls the mesh::runtime() once again leading to a deadlock when acquiring guard (__cxxabiv1::__cxa_guard_acquire) synchronizing static initialization of this code:

static Runtime *runtimePtr = new (buf) Runtime{}; 

It is totally possible that I am doing something wrong, but it seems to me as a circular initialization problem.
Please, could someone look at this issue and point me in right direction?
Thanks in advance.
Petr

hi, thanks for the report! I believe I see the problem here; I'll look to fix this this week.

the problem is that mesh internally calls snprintf(buf, buf_len - 1, "%s/alloc-mesh-%d.%zud", tmpDir, pid, i);, and we don't expect that to allocate, but for some reason it is calling back out to the allocator. Maybe locale initialization or some such? at any rate, we should be able to avoid calling snprintf here

Hi Bob,
thanks for the quick reply.
The fact that snprintf heap allocates was kind of a surprise for me as well.
Admittedly my (fairly old) glibc version 2.12-2 tries to use alloca first, not sure why the __libc_use_alloca call returned false.
See https://github.com/walac/glibc/blob/master/stdio-common/vfprintf.c#L1748 for the code.
Anyway, thanks again, I am looking forward to pull a fix.
BR
Petr