ofiwg / fabtests

FROZEN: the master branch has merged with the libfabric git repo

Home Page:http://libfabric.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fi_rdm_shared_av does not work with verbs providers

nmorey opened this issue · comments

Running libfabric/fabtests 1.4.2 on SUSE SLES12-SP3

Running a the fi_rdm_shared_av test in client/server mode over verbs fails with a segfault:

wingenfelder:~/:[0]# fi_rdm_shared_av -p verbs -s 192.168.0.1
janacek:~/:[0]# gdb --args fi_rdm_shared_av -p verbs -s 192.168.0.2 192.168.0.1
(gdb) set follow-fork-mode child 
(gdb) r
Starting program: /usr/bin/fi_rdm_shared_av -p verbs -s 192.168.0.2 192.168.0.1
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-61.3.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New process 27398]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff3e63700 (LWP 27410)]

Thread 2.1 "fi_rdm_shared_a" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fdb780 (LWP 27398)]
0x0000000000401a3f in run () at simple/rdm_shared_av.c:141
141	simple/rdm_shared_av.c: No such file or directory.
Missing separate debuginfos, use: zypper install libibverbs-debuginfo-14-6.7.x86_64 libibverbs1-debuginfo-14-6.7.x86_64 libinfinipath4-debuginfo-3.3-7.7.x86_64 libnl3-200-debuginfo-3.2.23-2.21.x86_64 libpsm2-2-debuginfo-10.2.103-2.6.x86_64 libpsm_infinipath1-debuginfo-3.3-7.7.x86_64 librdmacm1-debuginfo-14-6.7.x86_64 libuuid1-debuginfo-2.29.2-2.3.x86_64
(gdb) bt
#0  0x0000000000401a3f in run () at simple/rdm_shared_av.c:141
#1  main (argc=6, argv=<optimized out>) at simple/rdm_shared_av.c:196

Error is on this line:

			remote_fi_addr = ((fi_addr_t *)av_attr.map_addr)[0];

Looking into the code, it seems only the socket providers fills the map_addr (and the test works over sockets).

A quick look at the 1.5.0rc1 code seems to show that the bug will still be there ( haven't tried it yet)

The fi_rdm_shared_av test in fabtests 1.5rc1 checks for the FI_SHARED_AV capability and exits if the provider doesn't support it.

I'll update the package for SUSE to 1.5 and check that. Thanks

This is fixed in 1.5.0rc1 but this test now fails:

wingenfelder:/tmp/:[61]# fi_rma_bw -e rdm -o writedata -I 5 -p "verbs" -s 192.168.0.1 192.168.0.2
fi_inject_writedata(): common/shared.c:1503, ret=-38 (Function not implemented)
``

fi_rma_bw -e rdm -o writedata test is not supported by verbs/RDM. It is however supported by ofi_rxm over verbs. You can run the test with fi_rma_bw -e rdm -o writedata -p "ofi_rxm;verbs". ofi_rxm is an "utility" provider that emulates a RDM endpoint over MSG endpoint of a core provider.

I don't expect it to work over verbs, but I expect to be able to run the testsuite using runfabtests without it failing, which is not the case now

There is a plan to make runfabtests.sh run only those tests supported by a provider. That change would make it to the repo sometime later though.