linux-rdma / perftest

Infiniband Verbs Performance Tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ib_write_bw --cuda will lead to system deallock

antonywei opened this issue · comments

client
mlx5 nic
./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0 server_ip_address -a

server
mlx5 NIC
run: ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0

when pressing ctrl+c to kill the process, the hole system will crash and report system deadlock.

it will not happened if we don't use the param --use_cuda;

can you copy the crash dump here?

It seems the system has crashed before writing the core dump files, maybe the reason is ib_write_bw will not release GPU resources there are some problems (for example RNR error). however, the Cuda and kernel didn't release these resources and lead to the system crash.

I tried to reproduce it with loopback, and it didnt reproduce.
i pressed the ctrl+c while passing traffic and also when allocating the GPU buffer.
can you tell what is the exact time you tried to kill the process?

Closing the Issue, Please re-open if reproduce.