osx segfault
mackwic opened this issue · comments
Hi @eelcocramer !
I got a segfault today when trying to open 2 channels in the same script. I got the trace via https://github.com/ddopson/node-segfault-handler/ . It's quite useful, you should bundle it in bluetooth-serial-port. :)
The code is very boring, the same as always, just repeated twice (once per device found in the scan).
here is the stack trace:
PID 21166 received SIGSEGV for address: 0x0
0 segfault-handler.node 0x0000000100fe547f _ZL16segfault_handleriP9__siginfoPv + 287
1 libsystem_platform.dylib 0x00007fff88c165aa _sigtramp + 26
2 ??? 0x0000000100efc080 0x0 + 4310679680
3 BluetoothSerialPort.node 0x0000000100be5653 -[BluetoothWorker getRFCOMMChannelIDTask:] + 300
4 Foundation 0x00007fff8c5cf75e __NSThreadPerformPerform + 229
5 CoreFoundation 0x00007fff8e7b75b1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
6 CoreFoundation 0x00007fff8e7a8c62 __CFRunLoopDoSources0 + 242
7 CoreFoundation 0x00007fff8e7a83ef __CFRunLoopRun + 831
8 CoreFoundation 0x00007fff8e7a7e75 CFRunLoopRunSpecific + 309
9 Foundation 0x00007fff8c5d50fc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 253
10 Foundation 0x00007fff8c6bdaca -[NSRunLoop(NSRunLoop) run] + 74
11 Foundation 0x00007fff8c5d2d8b __NSThread__main__ + 1318
12 libsystem_pthread.dylib 0x00007fff85646899 _pthread_body + 138
13 libsystem_pthread.dylib 0x00007fff8564672a _pthread_struct_init + 0
14 libsystem_pthread.dylib 0x00007fff8564afc9 thread_start + 13
Which, after an objdump -dS
shows that we should be there:
35fc: eb 0a jmp 3608 <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xe1>
35fe: bf 01 00 00 00 mov $0x1,%edi
3603: e8 f0 53 00 00 callq 89f8 <_sleep$stub> <==== this is the call to sleep(1)
3608: 48 8b 3d b9 ae 00 00 mov 0xaeb9(%rip),%rdi # e4c8 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x5f0>
360f: 4c 89 f6 mov %r14,%rsi
3612: ff 15 b8 ad 00 00 callq *0xadb8(%rip) # e3d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x4f8>
3618: 48 89 c7 mov %rax,%rdi
361b: 4c 89 fe mov %r15,%rsi
361e: ff 15 bc ad 00 00 callq *0xadbc(%rip) # e3e0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x508>
3624: f2 0f 10 4d c8 movsd -0x38(%rbp),%xmm1
3629: 66 0f 2e c8 ucomisd %xmm0,%xmm1
362d: 76 29 jbe 3658 <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x131>
362f: 48 89 df mov %rbx,%rdi
3632: 4c 89 e6 mov %r12,%rsi
3635: ff 15 75 ad 00 00 callq *0xad75(%rip) # e3b0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x4d8>
363b: 48 85 c0 test %rax,%rax
363e: 74 be je 35fe <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xd7>
3640: 48 89 c7 mov %rax,%rdi
3643: 48 8d 35 a6 ad 00 00 lea 0xada6(%rip),%rsi # e3f0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x518>
364a: 4c 89 ea mov %r13,%rdx
364d: ff 15 9d ad 00 00 callq *0xad9d(%rip) # e3f0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x518>
3653: 48 85 c0 test %rax,%rax <========================== HERE
3656: 74 a6 je 35fe <-[BluetoothWorker getRFCOMMChannelIDTask:]+0xd7>
3658: 48 8d 35 c1 aa 00 00 lea 0xaac1(%rip),%rsi # e120 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x248>
365f: 48 89 df mov %rbx,%rdi
3662: ff 15 b8 aa 00 00 callq *0xaab8(%rip) # e120 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x248>
3668: 49 89 c6 mov %rax,%r14
366b: 4d 85 f6 test %r14,%r14
366e: 74 7f je 36ef <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x1c8>
3670: 48 8d 35 59 aa 00 00 lea 0xaa59(%rip),%rsi # e0d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x1f8>
3677: 4c 89 f7 mov %r14,%rdi
367a: ff 15 50 aa 00 00 callq *0xaa50(%rip) # e0d0 <__ZL27OBJC_CLASS_RO_$___ARCLite__+0x1f8>
3680: 31 db xor %ebx,%ebx
3682: 48 85 c0 test %rax,%rax
3685: 74 68 je 36ef <-[BluetoothWorker getRFCOMMChannelIDTask:]+0x1c8>
I would say that it's somewhere there, and, by counting the number of jumps, I would say it's line 327 (not sure about that).
Instanciating 2 instances of bluetooth-serial-port
reproduce the same behavior.
Also, interesting to see that for osx, the channels are open and the devices are connected.
Could you try adding some debug prints to nail it down? This happens when getting the channelID of the 2nd channel?
Here is an output of our script:
OK:scan in progress
OK:scan finished
OK:found 12L-43
OK:found 12R-43
OK:same pair
OK:connecting to 12L-43 (00-0c-9f-69-c3-41)
OK:connecting to 12R-43 (00-0c-98-1e-b8-b3)
PID 22142 received SIGSEGV for address: 0x0
0 segfault-handler.node 0x0000000100fe547f _ZL16segfault_handleriP9__siginfoPv + 287
1 libsystem_platform.dylib 0x00007fff88c165aa _sigtramp + 26
2 ??? 0x0000000100cfc080 0x0 + 4308582528
3 BluetoothSerialPort.node 0x0000000100be5653 -[BluetoothWorker getRFCOMMChannelIDTask:] + 300
4 Foundation 0x00007fff8c5cf75e __NSThreadPerformPerform + 229
5 CoreFoundation 0x00007fff8e7b75b1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
6 CoreFoundation 0x00007fff8e7a8c62 __CFRunLoopDoSources0 + 242
7 CoreFoundation 0x00007fff8e7a83ef __CFRunLoopRun + 831
8 CoreFoundation 0x00007fff8e7a7e75 CFRunLoopRunSpecific + 309
9 Foundation 0x00007fff8c5d50fc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 253
10 Foundation 0x00007fff8c6bdaca -[NSRunLoop(NSRunLoop) run] + 74
11 Foundation 0x00007fff8c5d2d8b __NSThread__main__ + 1318
12 libsystem_pthread.dylib 0x00007fff85646899 _pthread_body + 138
13 libsystem_pthread.dylib 0x00007fff8564672a _pthread_struct_init + 0
14 libsystem_pthread.dylib 0x00007fff8564afc9 thread_start + 13
ERROR:12:device 12R-43 connection error
From the fact that 12R-43 has a "nice" error, it seems that it's the first channel which segfault. Is Bluetooth-serial-port re-entrant ?
No. There is a lock and a second call should be blocked untill the first call has finished.
You can try to run your node script in gdb to get a stacktrace from the error.
Also adding a few NSLog(@"your debug text") might help. This writes the output to the console.
With a debug build of node v0.10.36 (our version), I got some assertions failure: https://gist.github.com/mackwic/7824c8a944ba6bd5952b
Same code, with valgrind: https://gist.github.com/mackwic/4c8367f37835a276f9aa
There is a lot of output. I'll try to sort out what happened
Ok. Some progress here.
I use this patched BluetoothWorker.mm which check every variable I happen to see in the body.
Using the normal build of node.js shows that the segfault happens independently of these variables see this trace.
Using the debug build shows nothing more, which seems to re-enforce the clue that the buffer could be badly allocated.
I'll dig in this direction.
Ok. So it wasn't the buffer. It was the NanUndefined()
call here that returned an incorrect handle.
Let me publish a fix in a PR.
Fixed in 1.2.4