ROCm / k8s-device-plugin

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

device plugin daemonset tries to load itself on non-amd64 nodes

agilob opened this issue · comments

commented

Resulting in lots of crashes on arm64 nodes:

❯ kgpwide | grep Crash
amdgpu-labeller-daemonset-qh2sr           0/1     CrashLoopBackOff   245 (4m3s ago)    20h     10.42.1.74    pirate1    <none>           <none>
amdgpu-device-plugin-daemonset-zqk24      0/1     CrashLoopBackOff   248 (3m52s ago)   20h     10.42.0.98    captain    <none>           <none>
amdgpu-device-plugin-daemonset-vbdkt      0/1     CrashLoopBackOff   248 (3m31s ago)   20h     10.42.2.208   pirate2    <none>           <none>
amdgpu-device-plugin-daemonset-4k988      0/1     CrashLoopBackOff   245 (2m47s ago)   20h     10.42.1.73    pirate1    <none>           <none>
amdgpu-labeller-daemonset-nhlwf           0/1     CrashLoopBackOff   247 (2m33s ago)   20h     10.42.2.209   pirate2    <none>           <none>
amdgpu-labeller-daemonset-bnjcb           0/1     CrashLoopBackOff   248 (31s ago)     20h     10.42.0.99    captain    <none>           <none>
commented

Can you add node affinity for amd64? It's something like this I edited the daemonset

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64