Mellanox / k8s-rdma-shared-dev-plugin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can't work in ROCE v2

zzhint opened this issue · comments

commented

Hello, when I use this plugin, I found that nics in different network segments cannot communicate, I think it is because I didn't use roce v2, but even I change the gid index (which show in show_gids), it didn't work too, can I have your help?

is there a specific issue with k8s-rdma-shared-dev-plugin ?

it seems like system / fabric configurations.

commented

I have two physical machines which has 8 nics, in one physical machine, a nic ip is 198.18.0.4/24, another physical machine has a nic which ip is 198.18.4.5/24, these nics are not in a network segments, but because I add the gateway, ping -I 198.18.0.4 198.18.4.5 can work, and ib_write_bw can also work。However, when use k8s-rdma-shared-dev-plugin, in container, ping can work, but ib_write_bw can't work! I think this is the problem of k8s-rdma-shared-dev-plugin: two nics in diffierent network segments, ib_write_bw can't work even ping can.

can you provide your full k8s config ?

  • os and kernel on worker nodes
  • rdma shared device plugin config
  • rdma shared device plugin logs
  • pod object
  • network attachment definition
  • output of ip a show, ip link show in both pods
  • content of /dev/infiniband folder of workload container in both pods
commented

I am sorry, we have changed the cni, now it works

@zzhint What cni did you use? I may meet the same problem with you that ib_write_bw just not work(no such device). Could you share what CNI you finally use and general configs?