JuliaInterop / ZMQ.jl

Julia interface to ZMQ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sending multipart messages segfaults (eventually)

rdeits opened this issue · comments

I've reproduced this in with Julia v0.6.2 using ZMQ master and ZMQ v0.5.1 on two separate machines (Ubuntu 14.04 and 16.04). To reproduce, run the following server in one terminal:

Edit: I've edited the example below to be even more minimal.

using ZMQ

ctx = Context()
sock = Socket(ctx, REP)
ZMQ.bind(sock, "tcp://127.0.0.1:5555")

while true
    ZMQ.recv(sock)
    while ZMQ.ismore(sock)
        ZMQ.recv(sock)
    end
    ZMQ.send(sock, "ok")
end

and run the following client in another:

using ZMQ

ctx = Context()
sock = Socket(ctx, REQ)
ZMQ.connect(sock, "tcp://127.0.0.1:5555")

while true
    ZMQ.send(sock, "hello", true)
    ZMQ.send(sock, zeros(UInt8, 5), false)
    ZMQ.recv(sock)
end

I reliably see the client crash with signal (6): Aborted and [1] 5768 segmentation fault (core dumped) julia zmqtest.jl after anywhere from 5-100 seconds of operation.

I can catch the abort in gdb, resulting in the following backtrace:

(gdb) r zmqtest.jl
Starting program: /usr/local/bin/julia zmqtest.jl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff409e700 (LWP 5977)]
[New Thread 0x7fffe8aae700 (LWP 5978)]
[New Thread 0x7fffe82ad700 (LWP 5979)]
[New Thread 0x7fffe5aac700 (LWP 5980)]
[New Thread 0x7fffe32ab700 (LWP 5981)]
[New Thread 0x7fffdeaaa700 (LWP 5982)]
[New Thread 0x7fffdc2a9700 (LWP 5983)]
[New Thread 0x7fffd9aa8700 (LWP 5984)]
[New Thread 0x7fffd92a7700 (LWP 5985)]
[New Thread 0x7fffd4aa6700 (LWP 5986)]
[New Thread 0x7fffd42a5700 (LWP 5987)]
[New Thread 0x7fffcfaa4700 (LWP 5988)]
[New Thread 0x7fffc867c700 (LWP 5989)]
[New Thread 0x7fffc7e7b700 (LWP 5990)]

Thread 15 "julia" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc7e7b700 (LWP 5990)]
0x00007ffff6d16428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6d16428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff6d1802a in __GI_abort () at abort.c:89
#2  0x00007ffff78047c0 in uv__async_send (wa=0x7ffff70a5d38 <main_arena+536>)
    at /buildworker/worker/package_linux64/build/deps/srccache/libuv-d8ab1c6a33e77bf155facb54215dd8798e13825d/src/unix/async.c:177
#3  0x00007ffff78043a7 in uv_async_send (handle=0xb24c10)
    at /buildworker/worker/package_linux64/build/deps/srccache/libuv-d8ab1c6a33e77bf155facb54215dd8798e13825d/src/unix/async.c:66
#4  0x00007fffc8f415a3 in julia_gc_free_fn_62624 () at /home/rdeits/.julia/v0.6/ZMQ/src/ZMQ.jl:315
#5  0x00007fffc8f41669 in jlcapi_gc_free_fn_62624 ()
#6  0x00007fffc88fefac in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#7  0x00007fffc8925f69 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#8  0x00007fffc891a6aa in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#9  0x00007fffc88f924c in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#10 0x00007fffc88f817e in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#11 0x00007fffc89237fa in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so
#12 0x00007ffff70b26ba in start_thread (arg=0x7fffc7e7b700) at pthread_create.c:333
#13 0x00007ffff6de841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The gdb backtrace consistently references this line as the source of the SIGABRT:

ccall(:uv_async_send,Cint,(Ptr{Cvoid},),hint)

Here's my versioninfo() on 16.04:

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Can you help me figure this out?

So far, the only solution I can find is to not use multipart messages ever, unfortunately.

Build ZMQ in debug mode to get line number info from GDB?

This appears to have been fixed by the recent ZMQ.jl updates!