rancher / dashboard

The Rancher UI

Home Page:https://rancher.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

You can over allocate vGPUs in RKE clusters on Harvester

noahgildersleeve opened this issue · comments

Setup

  • Rancher version: v2.8-head
  • Rancher UI Extensions:
  • Browser type & version: Chrome

Describe the bug

You can allocate more than the available vGPU profiles and the VMs go into an unschedulable state
To Reproduce

  1. create 4 vGPU profiles
  2. Create a 3 node RKE2 cluster with the vGPU profiles
  3. Wait for the cluster to come up
  4. Create another 3 node RKE2 cluster with the vGPU profiles
    Result
  • The first cluster comes up fine
  • The second cluster will have the first node come up and the other two nodes will go into unschedulable due to lacking vGPUs
    Expected Result

You should get an error when trying to create the second cluster
Screenshots

Greenshot 2024-05-03 17 28 54
Greenshot 2024-05-03 17 24 00

Additional context

This seems to be an issue that came up after the implementation of #10936 . It isn't updating the allocatable number after deploying vGPUs

The UI expects to find the allocatable number in node.status.allocatable field for each Harvester node. I assume that value is not updating after the first cluster creation.

This is a duplicate of harvester/harvester#5774 and UI doesn't have anything to do here so we'll track the work in that ticket.