ROCm / k8s-device-plugin

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't find any GPU

qiyueyuanwei opened this issue · comments

I follow the tutorial to configure the K8s-device-plugin, and then execute kubectl describe nodes, but the GPU is not found. Did I not install it successfully?
Allocatable:
cpu: 32
ephemeral-storage: 194465094130
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 129461548Ki
pods: 110

If i execute rocm-smi,i will get this result:
========================ROCm System Management Interface========================

GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
1 48.0c 27.0W 1386Mhz 800Mhz 0.0% auto 225.0W 0% 0%
2 46.0c 25.0W 1386Mhz 800Mhz 0.0% auto 225.0W 0% 0%
3 49.0c 25.0W 1386Mhz 800Mhz 0.0% auto 225.0W 0% 0%
4 48.0c 25.0W 1386Mhz 800Mhz 0.0% auto 225.0W 0% 0%

==============================End of ROCm SMI Log ==============================

And there is such a message when i execute kubectl describe nodes :kube-system amdgpu-device-plugin-daemonset-kcnfr

Can you post the log of the daemonset pod? And which kubernetes version and linux distro are you using?

Can you post the log of the daemonset pod? And which kubernetes version and linux distro are you using?

Errr,I have solved this problem ,It's a network problem。