tkestack / gpu-admission

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

在有 3 个 gpu 的节点出现调度失败的情况

scxu opened this issue · comments

想要实现 https://ieeexplore.ieee.org/abstract/document/8672318 这边文章中的内容。部署成功后发现其中一台有四卡的节点调度是没有问题的,但是另外一台只有三卡的机器会出现 Pending 的情况,也就是明明有资源但是 scheduler 说:

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/4 nodes are available: 1 Insufficient tencent.com/vcuda-core, 3 node(s) didn't match node selector.

这里是 kubectl describe node 的结果:

image

image

可以看到是只有两块卡被调度了。

之所以是有三块卡是因为有一块卡出了问题,把它屏蔽了。然后这种调度失败可以通过强制 kube-scheduler 重启的方式一定程度上解决,重启之后一般会正常一下,但是后面还会出类似的问题。

在有三块卡的机器上执行 nvidia-smi topo -mp 结果如下:

	GPU0	GPU1	GPU2	CPU Affinity
GPU0	 X 	SYS	SYS	0-11
GPU1	SYS	 X 	PIX	0-11
GPU2	SYS	PIX	 X 	0-11

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  • k8s 版本:1.17.3
  • nvidia driver 版本:440.59

maybe same with #5