Segfault when running `ucc_info`
nirandaperera opened this issue · comments
I am getting a segfault on ucc_info.
niranda@#####:~/ucc$ ./install/bin/ucc_info -c
[gohan-compute-01:59038:0:59038] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x20)
==== backtrace (tid: 59038) ====
0 /lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7f0767d8c474]
1 /lib/libucs.so.0(+0x2f66f) [0x7f0767d8c66f]
2 /lib/libucs.so.0(+0x2f956) [0x7f0767d8c956]
3 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f0767e01520]
4 /lib/libucs.so.0(ucs_config_clone_log_comp+0x8) [0x7f0767d7da08]
5 /lib/libucs.so.0(ucs_config_parser_set_default_values+0x28) [0x7f0767d7ebe8]
6 /lib/libucs.so.0(ucs_config_parser_fill_opts+0x26) [0x7f0767d7f196]
7 /home/niranda/ucc/install/lib/libucc.so.1(ucc_config_parser_fill_opts+0x16) [0x7f0768016a76]
8 /home/niranda/ucc/install/lib/libucc.so.1(ucc_constructor+0x7d) [0x7f0767ffed2d]
9 /home/niranda/ucc/install/lib/libucc.so.1(ucc_lib_config_read+0x1a) [0x7f0767ffe8fa]
10 ./install/bin/ucc_info(+0x22a0) [0x555853f052a0]
11 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f0767de8d90]
12 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f0767de8e40]
13 ./install/bin/ucc_info(+0x24d5) [0x555853f054d5]
=================================
Segmentation fault (core dumped)
I am using the following ucx version from conda
(ucc) niranda@######:~/ucc$ ucx_info -v
# Library version: 1.15.0
# Library path: /home/niranda/envs/ucc/bin/../lib/libucs.so.0
# API headers version: 1.15.0
# Git branch '', revision 49ca22e
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --build=x86_64-conda-linux-gnu --host=x86_64-conda-linux-gnu --prefix=/home/niranda/envs/ucc --with-sysroot --enable-cma --enable-mt --enable-numa --with-gnu-ld --with-cuda=/usr/local/cuda --with-rdmacm=/home/niranda/envs/ucc --with-verbs=/home/niranda/envs/ucc
Configured ucc as follows
./configure --with-ucx=$CONDA_PREFIX --prefix=$PWD/install
Could you please let me know what the problem here?
Downgrading ucx to 1.13.1 was a workaround. Isn't the ucc master branch in par with UCX releases?
It might happen due to missing backward compatibility in UCS. UCC uses UCS for different service functions (memory allocation, registration, parsing etc.) but UCS is not backward or forward compatible. So UCS runtime version should match UCS compile time version.