[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used

Question

[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used

bfinta opened this issue 8 months ago · comments

Balazs Finta commented 8 months ago

Area

Scheduler
Controller
Helm Chart
Documents

Other components

No response

What happened?

Capacity scheduler won't preempt pods if all resource items are used because of https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/capacityscheduling/capacity_scheduling.go#L592

Scenario:
Cluster has 5 GPUs. Team A has the following elastic quota: gpu.min: 4, gpu.max: 5. Team B has gpu.min: 1, gpu.max: 5.
Team A runs a workload with 5 pods and it consumes all 5 GPUs. When Team B wants to run their workload even with 1 GPU, the pod stays in Pending, because
sum(quotas.used) + pod.requests > sum(quotas.min)
5 + 1 > 5

What did you expect to happen?

The scheduler should preempt pods until the other EQ's min is reached.

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

It would be great to be configured in the scheduler configuration file.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"b39bf148cd654599a52e867485c02c4f9d28b312", GitTreeState:"clean", BuildDate:"2022-09-21T13:12:04Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

Scheduler Plugins version

registry.k8s.io/scheduler-plugins/controller:v0.24.9

Wei Huang · Answer 1 · Fri Dec 08 2023 09:11:30 GMT+0800 (China Standard Time)

I think it's part of original design to ensure the system's resource is not fully occupied by a single tenant/namespace.

@denkensk I mentioned this symptom to you the other day. Overall, I felt this is a bit counterintuitive as setting two tenants as 4/5 and 1/5 while having 5 in total sounds a common practice to allocate elastic quota. Could you help shed some light on the original design?