rancher / dashboard

The Rancher UI

Home Page:https://rancher.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

You can allocate a 3 node RKE2 cluster when you only have one available vGPU

noahgildersleeve opened this issue · comments

Setup

  • Rancher version:v2.8-head
  • Rancher UI Extensions:
  • Browser type & version: Chrome
    Version 124.0.6367.78
    Harvester Version: v1.3.0

Describe the bug

When you create a 3 node RKE2 cluster it only checks if you have one vGPU in the advanced even though it allocates the vGPUs to every VM. This means that if you do this then it will create YAML with the same vGPU for each. Only the first node will come up. The other two will show as unschedulable in Harvester and will loop on provisioning.
To Reproduce

  1. Set up vGPU profiles in Harvester
  2. Only set up 1 profile
  3. Import Harvester into Rancher
  4. Create a new 3 node RKE2 cluster with Harvester as downstream provider
  5. Make the RKE2 cluster have one vGPU assigned to it
    Result
    The first node will allocate and the others will be unschedulable. Rancher will start deleting and trying to reprovision them after the timeout
    Expected Result

You shouldn't be allowed to allocate vGPUs that don't exist

Screenshots

Greenshot 2024-04-29 16 22 10

Additional context

Found while testing #10833
If you have enough vGPUs created with the same profile then this will probably work. For instance if you have 4 2Q profiles and then add them to a 3 node cluster it will probably allocate fine. I'm going to check this a bit later when testing resources free up.

/backport v2.8.next1

/backport v2.8.next1

Validated in Rancher v2.8-862f57beb6ff7caeab6b4e3c00c89912050cf317-head and Harvester v1.3.0.