Segmentation fault during shutdown
koppi opened this issue · comments
May 16 01:01:19 x200 msgd:0: normal shutdown - global segment detached
May 16 01:26:31 x200 msgd:0: startup pid=13071 flavor=posix rtlevel=1 usrlevel=1 halsize=524288 shm=Posix gcc=4.7.2 version=git not installed at configure time
May 16 01:26:31 x200 msgd:0: ØMQ=4.0.4 czmq=2.2.0 protobuf=2.4.1
May 16 01:26:31 x200 msgd:0: configured: sha=git not installed or executable
May 16 01:26:31 x200 msgd:0: built: May 12 2015 12:42:47 sha=git not installed or executable
May 16 01:26:31 x200 msgd:0: publishing ZMQ/protobuf log messages at ipc:///tmp/0.log.a42c8c6b-4025-4f83-ba28-dad21114744a
May 16 01:26:31 x200 msgd:0: rtapi_app:13077:user accepting commands at ipc:///tmp/0.rtapi.a42c8c6b-4025-4f83-ba28-dad21114744a
May 16 01:26:31 x200 msgd:0: hal_lib:13077:rt creating ladder-state
May 16 01:26:31 x200 msgd:0: hal_lib:13101:user INFO CLASSICLADDER- No ladder GUI requested-Realtime runs till HAL closes.
May 16 01:28:06 x200 msgd:0: rtapi_app:13077:user signal 11 - 'Segmentation fault' received, dumping core (current dir=/home/koppi/linuxcnc/configs/koppi-cnc)
May 16 01:28:06 x200 msgd:0: rtapi_app:13077:user (backtrace not available - libbacktrace not found during build)
May 16 01:28:06 x200 msgd:0: rtapi_app:13077:user signal 11 - 'Segmentation fault' received, dumping core (current dir=/home/koppi/linuxcnc/configs/koppi-cnc)
May 16 01:28:06 x200 msgd:0: rtapi_app:13077:user (backtrace not available - libbacktrace not found during build)
May 16 01:28:06 x200 msgd:0: rtapi_app exit detected - scheduled shutdown
May 16 01:28:08 x200 msgd:0: msgd shutting down
May 16 01:28:08 x200 msgd:0: log buffer hwm: 0% (4 msgs, 506 bytes out of 524288)
May 16 01:28:08 x200 msgd:0: normal shutdown - global segment detached
That's the segfault of lcec
, isn't it ?
Good to know I am not the only one who sees this.
So I tried a very simple script:
$ DEBUG=5 realtime start
$ halcmd loadusr -W lcec_conf ./output_1sl.xml
<commandline>:0: Component 'lcec_conf' ready
<commandline>:0: Program 'lcec_conf' started
$ halcmd unloadusr lcec_conf
$ halcmd loadrt lcec
<commandline>:0: Realtime module 'lcec' loaded
$ halcmd unload lcec
<commandline>:0: Realtime module 'lcec' unloaded
$ DEBUG=5 realtime stop
<commandline>:0: Realtime threads stopped
with gdb output:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007f36720df261 in hal_exit (comp_id=66) at hal/lib/hal_comp.c:312
312 retval = rtapi_shmem_delete(lib_mem_id, comp_id);
(gdb) p lib_mem_id
$1 = 1
(gdb) p comp_id
$2 = 66
(gdb) backtrace
#0 0x00007f36720df261 in hal_exit (comp_id=66) at hal/lib/hal_comp.c:312
#1 0x00007f36720d32df in rtapi_app_exit () at hal/lib/hal_lib.c:209
#2 0x00000000004081fd in do_unload_cmd (name="hal_lib", reply=..., instance=<optimized out>) at rtapi/xenomai/rtapi_app.cc:645
#3 0x00000000004086c0 in exit_actions (instance=<optimized out>) at rtapi/xenomai/rtapi_app.cc:667
#4 0x0000000000409be8 in rtapi_request (loop=<optimized out>, poller=0x24e1ff0, arg=<optimized out>) at rtapi/xenomai/rtapi_app.cc:829
#5 0x00007f36746de23e in zloop_start () from /usr/lib/x86_64-linux-gnu/libczmq.so.3
#6 0x000000000040941d in mainloop (argc=argc@entry=2, argv=argv@entry=0x7fff2c15a0f8) at rtapi/xenomai/rtapi_app.cc:1336
#7 0x00000000004040c6 in main (argc=2, argv=0x7fff2c15a0f8) at rtapi/xenomai/rtapi_app.cc:1693
Maybe, @mhaberler will see the reason of the fault at once?
please provide instructions to reproduce - branch, commit, any other config
special hardware needed?
@mhaberler
Yes, you need special hardware: one EtherCAT slave.
If you need (generic) XML file for your slave as linuxcnc-ethercat
demands,
tell me which slave you have (vendor id, product id), then I would provide such an XML file.
I'll try to isolate the bug meanwhile as it boils down to several calls of rtapi_shmem_new
and rtapi_shmem_delete
by making an example component.
it looks like an rtapi_shmem_delete deleted the wrong segment - that of hal_lib, and that causes the crash: referencing hal data structs which are not available any more
check the segment numbers being freed
@mhaberler
I added
int statusCode = rtapi_shmem_delete(shmem_id, comp_id);
rtapi_print_msg(RTAPI_MSG_INFO, LCEC_MSG_PFX "shmem del: status %d, shmem_id %d, comp_id %d\n", statusCode, shmem_id, comp_id);
at https://github.com/sittner/linuxcnc-ethercat/blob/master/src/lcec_main.c#L833 .
/var/log/linuxcnc.log
shows then:
Aug 16 08:18:42 debian-master msgd:0: hal_lib:14257:rt LCEC: shmem del: status 0, shmem_id 2, comp_id 81
whereas gdb -p ...
yields the same as before:
#1 0x00007fe0eb633261 in hal_exit (comp_id=66) at hal/lib/hal_comp.c:312
312 retval = rtapi_shmem_delete(lib_mem_id, comp_id);
(gdb) p comp_id
$5 = 66
(gdb) print comp_id
$6 = 66
(gdb) print lib_mem_id
$7 = 1
So the segment numbers (shmem_id
) are not the same....
can I reproduce somehow without EtherCAT peripheral?
I'll try to put all these rtapi_shmem_new and rtapi_shmem_delete into a simple component this evening and then report.
super!
In order to mimick linuxcnc-ethercat one needs one non RT and one RT component.
These are
https://github.com/sirop/Issue_koppi_mk4/blob/master/shmemTest_USR.c
and
https://github.com/sirop/Issue_koppi_mk4/blob/master/shmemTest.c .
Compile insructions are within the files.
Hal script: https://github.com/sirop/Issue_koppi_mk4/blob/master/shmem.hal .
The segfault occurs when exiting HAL.
/var/log/linuxcnc.log
says:
Aug 16 11:04:07 debian-master msgd:0: rtapi_app:18506:user signal 11 - 'Segmentation fault' received, dumping core (current dir=/home/master/ecat_exper)
Aug 16 11:04:07 debian-master msgd:0: rtapi_app:18506:user --- rtapi_app backtrace: ---
Aug 16 11:04:07 debian-master msgd:0: rtapi_app:18506:user 7fb70b363261 hal_exit (hal/lib/hal_comp.c:312)
Aug 16 11:04:07 debian-master msgd:0: rtapi_app:18506:user 7fb70b3572de rtapi_app_exit (hal/lib/hal_lib.c:209)
That is the same error at the same place hal/lib/hal_comp.c:312
as before with lcec
.
super, will check
I could reproduce the issue, looking into it
fix: #include "rtapi_app.h"
in shmemTest.c and properly build the component
The build commands in the C files are incorrect and hence do not expose that the comp does not even build - integrating the component into the Submakefiles like other comps shows this:
gcc -c -O0 -g -Wall -funwind-tables -I. -I/usr/include/xenomai -D_GNU_SOURCE -D_REENTRANT -D__XENO__ -DTHREAD_FLAVOR_ID=2 -DRTAPI -D_GNU_SOURCE -D_FORTIFY_SOURCE=0 -DPB_FIELD_32BIT '-DPB_SYSTEM_HEADER=<'machinetalk'/include/pb-linuxcnc.h>' -D__MODULE__ -I. -I./libnml/linklist -I./libnml/cms -I./libnml/rcs -I./libnml/inifile -I./libnml/os_intf -I./libnml/nml -I./libnml/buffer -I./libnml/posemath -I./rtapi -I./hal/lib -I./emc/nml_intf -I./emc/kinematics -I./emc/motion -I./emc/tp -I./machinetalk/nanopb -I./machinetalk/build -DSEQUENTIAL_SUPPORT -DHAL_SUPPORT -DDYNAMIC_PLCSIZE -DRT_SUPPORT -DOLD_TIMERS_MONOS_SUPPORT -DMODBUS_IO_MASTER -mieee-fp -fPIC hal/components/shmemTest.c -o objects/xenomai/hal/components/shmemTest.o
Linking ../rtlib/xenomai/shmemTest.so
ld -d -r -o objects/xenomai/shmemTest.tmp objects/xenomai/hal/components/shmemTest.o
objcopy -j .rtapi_export -O binary objects/xenomai/shmemTest.tmp objects/xenomai/shmemTest.exported
(echo '{ global : '; tr -s '\0' <objects/xenomai/shmemTest.exported | xargs -r0 printf '%s;\n' | grep .; echo 'local : * ; };') > objects/xenomai/shmemTest.ver
gcc -shared -Bsymbolic -L/home/mah/machinekit-check/lib -Wl,-rpath,/home/mah/machinekit-check/lib -Wl,--no-as-needed -Wl,--version-script,objects/xenomai/shmemTest.ver -o ../rtlib/xenomai/shmemTest.so objects/xenomai/hal/components/shmemTest.o
/usr/bin/ld:objects/xenomai/shmemTest.ver:2: syntax error in VERSION script
collect2: error: ld returned 1 exit status
Makefile:1599: recipe for target '../rtlib/xenomai/shmemTest.so' failed
make[1]: *** [../rtlib/xenomai/shmemTest.so] Error 1
test branch: https://github.com/mhaberler/machinekit/tree/koppi-shmem
non-fatal: shm keys are only significant in the lower 24bits, please see https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_shmkeys.h
fix: #include "rtapi_app.h" in shmemTest.c
That was my fault as "rtapi_app.h" is there: https://github.com/sittner/linuxcnc-ethercat/blob/master/src/lcec_main.c#L42 .
The build commands in the C files are incorrect and hence do not expose the comp ...
Nevertheless shmemTest.c
can be built with the build commands in its C file.
What does "do not expose the comp" mean?
please look at the buid log above, and what the build does, and compare to your build command - all you do is compile and create a .so
you are missing essential steps: objcopy - extraction of symbols in the .rtapi_export
section, creation of a linker script fragment, final link
I think using the out-of-tree building Makefile.modinc should take care of that
text: should have been "that the comp does not even build"
Seems to be solved through sittner/linuxcnc-ethercat#49
and sittner/linuxcnc-ethercat#56 ?