Mellanox / k8s-rdma-shared-dev-plugin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error creating new device

asdfry opened this issue · comments

Hello,

I have installed the network-operator based on the following values.yaml.

deployCR: true
rdmaSharedDevicePlugin:
  deploy: true
  resources:
    - name: rdma_shared_device_a
      vendors: ["15b3"]
      deviceIDs: ["1017", "1021", "101b"]
secondaryNetwork:
  deploy: true
  multus:
    deploy: true
  cniPlugins:
    deploy: true
  ipamPlugin:
    deploy: true

However, when I checked the logs of the created plugin, I encountered the following error.

2024/05/17 09:54:41 Starting K8s RDMA Shared Device Plugin version= master
2024/05/17 09:54:41 resource manager reading configs
2024/05/17 09:54:41 Reading /k8s-rdma-shared-dev-plugin/config.json
Using Kubelet Plugin Registry Mode
2024/05/17 09:54:41 loaded config: [{ResourceName:rdma_shared_device_a ResourcePrefix: RdmaHcaMax:1000 Devices:[] Selectors:{Vendors:[15b3] DeviceIDs:[1017 1021 101b] Drivers:[mlx5_core] IfNames:[] LinkTypes:[]}}] 
2024/05/17 09:54:41 no periodic update interval is set, use default interval 60 seconds
2024/05/17 09:54:41 Discovering host devices
2024/05/17 09:54:41 discovering host network devices
2024/05/17 09:54:41 DiscoverHostDevices(): device found: 0000:07:00.0   02              Mellanox Technolo...    MT27800 Family [ConnectX-5]             
2024/05/17 09:54:41 DiscoverHostDevices(): device found: 0000:08:00.0   02              Mellanox Technolo...    MT28908 Family [ConnectX-6]             
2024/05/17 09:54:41 DiscoverHostDevices(): device found: 0000:0a:00.0   02              Red Hat, Inc.           Virtio network device                   
2024/05/17 09:54:41 DiscoverHostDevices(): device found: 0000:0b:00.0   02              Red Hat, Inc.           Virtio network device                   
2024/05/17 09:54:41 Initializing resource servers
2024/05/17 09:54:41 Resource: &{ResourceName:rdma_shared_device_a ResourcePrefix:rdma RdmaHcaMax:1000 Devices:[] Selectors:{Vendors:[15b3] DeviceIDs:[1017 1021 101b] Drivers:[] IfNames:[] LinkTypes:[]}}
2024/05/17 09:54:41 error creating new device: "missing RDMA device spec for device 0000:0a:00.0, RDMA device \"issm\" not found"
2024/05/17 09:54:41 error creating new device: "missing RDMA device spec for device 0000:0b:00.0, RDMA device \"issm\" not found"
2024/05/17 09:54:41 Starting all servers...
2024/05/17 09:54:41 starting rdma/rdma_shared_device_a device plugin endpoint at: rdma_shared_device_a.sock
2024/05/17 09:54:41 rdma/rdma_shared_device_a device plugin endpoint started serving
2024/05/17 09:54:41 All servers started.
2024/05/17 09:54:41 Listening for term signals
2024/05/17 09:54:41 Starting OS watcher.
2024/05/17 09:54:42 Updating "rdma/rdma_shared_device_a" devices
2024/05/17 09:54:42 rdma_shared_device_a.sock gets registered successfully at Kubelet 
2024/05/17 09:54:42 exposing "1000" devices

As you can see from the logs, I only entered the Mellanox ID as the vendor value, but it seems that a Red Hat device was detected, causing the error.

Additionally, I am attaching a part of the lspci -nn from this node.

07:00.0 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
08:00.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
0a:00.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1041] (rev 01)
0b:00.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1041] (rev 01)

I would appreciate your help with this.

Thank you.

is it not working for you ? or are you just wondering about the error message ? i believe that device plugin reports resources to kubelet. when creating a workload consuming rdma resource do you not get assigned the expected devices ?

the device plugin first discovers devices and only then assigns to resource pool according to provided configuration.

Yes, it does not work properly.
After accessing the pod using kubectl exec and entering ibv_devinfo, it shows "No IB devices found."
Is there any label that needs to be added when creating the pod?

are the resources exposed on the k8s node object ? run kubectl describe node <node name>
are you requesting rdma resources as part of the container resources requests and limits ? can you share the pod yaml ?

Oh! It works well when I add rdma/rdma_shared_device_a: "2" to the resources while creating the pod!
Should the value of rdma/rdma_shared_device_a be the number of devices to use?

And regarding my first question, is it difficult to understand why devices from other vendors are included in the RDMA plugin targets? (Why it targets Red Hat devices)

this device plugin exposes the same RDMA resources to containers requesting it (as the name suggests it shares the rdma resource) so you can just specify 1.

the RDMA resources are defined by the resource pool. depending on how many devices were selected by the provided selectors.

regarding your first question: thats how currently its implemented. it first builds a "pci net device" object for every pci network device it finds and only after it performs the filtering according to the pool.

since the redhat device does not support rdma it emits this error. that device is then not considered for the resource pool.

Thank you so much for your quick and detailed response!

May I ask one more question...?
I'm trying to run two pods with RDMA devices and test RDMA communication using ib_send_bw, but I'm encountering this error. (The test works fine without the -R option)
ib_send_bw
Is there anything else I need to do besides installing the Network Operator?
I have limited background knowledge, so it's hard for me to find examples...

rdma-test-node7.yaml

apiVersion: v1
kind: Pod
metadata:
  name: rdma-test-node7
  namespace: jsh
spec:
  containers:
  - name: app
    image: asdfry/train-llm:20240411
    command: ["/bin/bash", "-c"]
    args: ["sleep infinity"]
    resources:
      limits:
        ten1010.io/gpu-nvidia-l40: "4"
        rdma/rdma_shared_device_a: "1"
      requests:
        ten1010.io/gpu-nvidia-l40: "4"
        rdma/rdma_shared_device_a: "1"
    securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]

-R means to use RDMA CM (Connection Manager), you need to have a network interface associated with the RDMA device.
which means you need to use multus + macvlan cni for secondary network specifying the physical netdev in the cni config