eelcocramer / node-bluetooth-serial-port

Serial I/O over bluetooth for NodeJS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

osx segfault

mackwic opened this issue · comments

Hi @eelcocramer !

I got a segfault today when trying to open 2 channels in the same script. I got the trace via https://github.com/ddopson/node-segfault-handler/ . It's quite useful, you should bundle it in bluetooth-serial-port. :)

The code is very boring, the same as always, just repeated twice (once per device found in the scan).

here is the stack trace:

PID 21166 received SIGSEGV for address: 0x0
0   segfault-handler.node               0x0000000100fe547f _ZL16segfault_handleriP9__siginfoPv + 287
1   libsystem_platform.dylib            0x00007fff88c165aa _sigtramp + 26
2   ???                                 0x0000000100efc080 0x0 + 4310679680
3   BluetoothSerialPort.node            0x0000000100be5653 -[BluetoothWorker getRFCOMMChannelIDTask:] + 300
4   Foundation                          0x00007fff8c5cf75e __NSThreadPerformPerform + 229
5   CoreFoundation                      0x00007fff8e7b75b1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
6   CoreFoundation                      0x00007fff8e7a8c62 __CFRunLoopDoSources0 + 242
7   CoreFoundation                      0x00007fff8e7a83ef __CFRunLoopRun + 831
8   CoreFoundation                      0x00007fff8e7a7e75 CFRunLoopRunSpecific + 309
9   Foundation                          0x00007fff8c5d50fc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 253
10  Foundation                          0x00007fff8c6bdaca -[NSRunLoop(NSRunLoop) run] + 74
11  Foundation                          0x00007fff8c5d2d8b __NSThread__main__ + 1318
12  libsystem_pthread.dylib             0x00007fff85646899 _pthread_body + 138
13  libsystem_pthread.dylib             0x00007fff8564672a _pthread_struct_init + 0
14  libsystem_pthread.dylib             0x00007fff8564afc9 thread_start + 13

Which, after an objdump -dS shows that we should be there:

    35fc:       eb 0a                   jmp    3608 <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xe1>
    35fe:       bf 01 00 00 00          mov    $0x1,%edi
    3603:       e8 f0 53 00 00          callq  89f8 <_sleep$stub>  <==== this is the call to sleep(1)
    3608:       48 8b 3d b9 ae 00 00    mov    0xaeb9(%rip),%rdi        # e4c8 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x5f0>
    360f:       4c 89 f6                mov    %r14,%rsi
    3612:       ff 15 b8 ad 00 00       callq  *0xadb8(%rip)        # e3d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x4f8>
    3618:       48 89 c7                mov    %rax,%rdi
    361b:       4c 89 fe                mov    %r15,%rsi
    361e:       ff 15 bc ad 00 00       callq  *0xadbc(%rip)        # e3e0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x508>
    3624:       f2 0f 10 4d c8          movsd  -0x38(%rbp),%xmm1
    3629:       66 0f 2e c8             ucomisd %xmm0,%xmm1
    362d:       76 29                   jbe    3658 <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x131>
    362f:       48 89 df                mov    %rbx,%rdi
    3632:       4c 89 e6                mov    %r12,%rsi
    3635:       ff 15 75 ad 00 00       callq  *0xad75(%rip)        # e3b0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x4d8>
    363b:       48 85 c0                test   %rax,%rax
    363e:       74 be                   je     35fe <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xd7>
    3640:       48 89 c7                mov    %rax,%rdi
    3643:       48 8d 35 a6 ad 00 00    lea    0xada6(%rip),%rsi        # e3f0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x518>
    364a:       4c 89 ea                mov    %r13,%rdx
    364d:       ff 15 9d ad 00 00       callq  *0xad9d(%rip)        # e3f0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x518>
    3653:       48 85 c0                test   %rax,%rax  <========================== HERE
    3656:       74 a6                   je     35fe <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xd7>
    3658:       48 8d 35 c1 aa 00 00    lea    0xaac1(%rip),%rsi        # e120 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x248>
    365f:       48 89 df                mov    %rbx,%rdi
    3662:       ff 15 b8 aa 00 00       callq  *0xaab8(%rip)        # e120 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x248>
    3668:       49 89 c6                mov    %rax,%r14
    366b:       4d 85 f6                test   %r14,%r14
    366e:       74 7f                   je     36ef <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x1c8>
    3670:       48 8d 35 59 aa 00 00    lea    0xaa59(%rip),%rsi        # e0d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x1f8>
    3677:       4c 89 f7                mov    %r14,%rdi
    367a:       ff 15 50 aa 00 00       callq  *0xaa50(%rip)        # e0d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x1f8>
    3680:       31 db                   xor    %ebx,%ebx
    3682:       48 85 c0                test   %rax,%rax
    3685:       74 68                   je     36ef <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x1c8>

I would say that it's somewhere there, and, by counting the number of jumps, I would say it's line 327 (not sure about that).

Instanciating 2 instances of bluetooth-serial-port reproduce the same behavior.
Also, interesting to see that for osx, the channels are open and the devices are connected.

commented

Could you try adding some debug prints to nail it down? This happens when getting the channelID of the 2nd channel?

Here is an output of our script:

OK:scan in progress
OK:scan finished
OK:found 12L-43
OK:found 12R-43
OK:same pair
OK:connecting to 12L-43 (00-0c-9f-69-c3-41)
OK:connecting to 12R-43 (00-0c-98-1e-b8-b3)
PID 22142 received SIGSEGV for address: 0x0
0   segfault-handler.node               0x0000000100fe547f _ZL16segfault_handleriP9__siginfoPv + 287
1   libsystem_platform.dylib            0x00007fff88c165aa _sigtramp + 26
2   ???                                 0x0000000100cfc080 0x0 + 4308582528
3   BluetoothSerialPort.node            0x0000000100be5653 -[BluetoothWorker getRFCOMMChannelIDTask:] + 300
4   Foundation                          0x00007fff8c5cf75e __NSThreadPerformPerform + 229
5   CoreFoundation                      0x00007fff8e7b75b1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
6   CoreFoundation                      0x00007fff8e7a8c62 __CFRunLoopDoSources0 + 242
7   CoreFoundation                      0x00007fff8e7a83ef __CFRunLoopRun + 831
8   CoreFoundation                      0x00007fff8e7a7e75 CFRunLoopRunSpecific + 309
9   Foundation                          0x00007fff8c5d50fc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 253
10  Foundation                          0x00007fff8c6bdaca -[NSRunLoop(NSRunLoop) run] + 74
11  Foundation                          0x00007fff8c5d2d8b __NSThread__main__ + 1318
12  libsystem_pthread.dylib             0x00007fff85646899 _pthread_body + 138
13  libsystem_pthread.dylib             0x00007fff8564672a _pthread_struct_init + 0
14  libsystem_pthread.dylib             0x00007fff8564afc9 thread_start + 13
ERROR:12:device 12R-43 connection error

From the fact that 12R-43 has a "nice" error, it seems that it's the first channel which segfault. Is Bluetooth-serial-port re-entrant ?

commented

No. There is a lock and a second call should be blocked untill the first call has finished.

commented

You can try to run your node script in gdb to get a stacktrace from the error.

Also adding a few NSLog(@"your debug text") might help. This writes the output to the console.

With a debug build of node v0.10.36 (our version), I got some assertions failure: https://gist.github.com/mackwic/7824c8a944ba6bd5952b

Same code, with valgrind: https://gist.github.com/mackwic/4c8367f37835a276f9aa
There is a lot of output. I'll try to sort out what happened

Ok. Some progress here.
I use this patched BluetoothWorker.mm which check every variable I happen to see in the body.

Using the normal build of node.js shows that the segfault happens independently of these variables see this trace.
Using the debug build shows nothing more, which seems to re-enforce the clue that the buffer could be badly allocated.

I'll dig in this direction.

Ok. So it wasn't the buffer. It was the NanUndefined() call here that returned an incorrect handle.

Let me publish a fix in a PR.

commented

Fixed in 1.2.4