Nugine / rdma

Low-level RDMA API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`rdma_rxe` `ibv_post_send` soft lockup

Nugine opened this issue · comments

Fixed by d2b3190...cff469d.
The real cause is in the kernel module rdma_rxe. I don't know why it happens.

To reproduce:

git checkout d2b3190aecc84d64d4616ae6a9f9f1b20ee2f052
for i in `seq 1 100`
do
    just bench-pingpong-rc
done

OS: Ubuntu 20.04

$ uname -srv
Linux 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022
$ pkg-config --modversion libibverbs
1.14.41.0
$ pkg-config --modversion librdmacm
1.3.41.0

rdma-core version 3639589614c387669e0d66e0cdf956634a050bcc

https://www.rdmamojo.com/2013/01/26/ibv_post_send/

If this is an RC QP, verify that the rnr_retry value that was configured in ibv_modify_qp() isn't 7 since this may lead to retry infinite time in case of RNR flow