Find correct combination with given NUMA node ID
283713406 opened this issue · comments
Area
- Scheduler
- Controller
- Helm Chart
- Documents
Other components
No response
What happened?
func findSuitableCombination(identifier string, qos v1.PodQOSClass, numaNodes NUMANodeList, resources v1.ResourceList, numaNodesCombination [][]int) ([]int, bool) {
minAvgDistance := minAvgDistanceInCombinations(numaNodes, numaNodesCombination)
var (
minDistanceCombination []int
// init as max distance
minDistance float32 = 256
)
for _, combination := range numaNodesCombination {
combinationResources := combineResources(numaNodes, combination)
resourcesFit := checkResourcesFit(identifier, qos, resources, combinationResources)
if resourcesFit {
distance := nodesAvgDistance(numaNodes, combination...)
if distance == minAvgDistance {
// return early if we can fit resources into combination and provide minDistance
return combination, true
}
// we don't have to check which combination bitmask has lower value since we are generating them from lowest value
if distance < minDistance {
minDistance = distance
minDistanceCombination = combination
}
}
}
return minDistanceCombination, false
}
Sometimes, some servers may not have memory modules inserted in every NUMA node. Assuming that the total resources of the combination satisfy the Pod request, but there is a NUMA node in the combination that does not have a memory module, it is obvious that this combination is incorrect, but we still determine that it is available
What did you expect to happen?
Assuming that the combination contains NUMA nodes without memory modules, the combination should be excluded
For example, in the following NUMANodeList, the existing code logic will select a combination of (0, 1, 5). But should choose (2, 4, 6)
{
numaNodes: NUMANodeList{
{
NUMAID: 0,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
v1.ResourceMemory: resource.MustParse("5Gi"),
},
Costs: map[int]int{
0: 10, 1: 20, 2: 40, 3: 30, 4: 20, 5: 30, 6: 50, 7: 40,
},
},
{
NUMAID: 3,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
},
Costs: map[int]int{
0: 30, 1: 40, 2: 20, 3: 10, 4: 30, 5: 20, 6: 40, 7: 50,
},
},
{
NUMAID: 5,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
},
Costs: map[int]int{
0: 30, 1: 20, 2: 50, 3: 20, 4: 50, 5: 10, 6: 50, 7: 40,
},
},
{
NUMAID: 7,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
},
Costs: map[int]int{
0: 40, 1: 50, 2: 30, 3: 50, 4: 20, 5: 40, 6: 30, 7: 10,
},
},
{
NUMAID: 1,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
},
Costs: map[int]int{
0: 20, 1: 10, 2: 30, 3: 40, 4: 50, 5: 20, 6: 40, 7: 50,
},
},
{
NUMAID: 6,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
v1.ResourceMemory: resource.MustParse("5Gi"),
},
Costs: map[int]int{
0: 50, 1: 40, 2: 20, 3: 40, 4: 30, 5: 50, 6: 10, 7: 30,
},
},
{
NUMAID: 2,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
v1.ResourceMemory: resource.MustParse("5Gi"),
},
Costs: map[int]int{
0: 40, 1: 30, 2: 10, 3: 20, 4: 40, 5: 50, 6: 20, 7: 30,
},
},
{
NUMAID: 4,
Resources: v1.ResourceList{
gpuResource: resource.MustParse("1"),
v1.ResourceCPU: *resource.NewQuantity(4, resource.DecimalSI),
v1.ResourceMemory: resource.MustParse("5Gi"),
},
Costs: map[int]int{
0: 20, 1: 50, 2: 40, 3: 30, 4: 10, 5: 50, 6: 30, 7: 20,
},
},
},
podResources: v1.ResourceList{
v1.ResourceCPU: *resource.NewQuantity(3, resource.DecimalSI),
v1.ResourceMemory: resource.MustParse("2Gi"),
gpuResource: resource.MustParse("3"),
},
}
How can we reproduce it (as minimally and precisely as possible)?
No response
Anything else we need to know?
No response
Kubernetes version
[root@master1 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.13", GitCommit:"49433308be5b958856b6949df02b716e0a7cf0a3", GitTreeState:"clean", BuildDate:"2023-04-12T12:15:50Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.13", GitCommit:"49433308be5b958856b6949df02b716e0a7cf0a3", GitTreeState:"clean", BuildDate:"2023-04-12T12:08:36Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/arm64"}
Scheduler Plugins version
I agree this is a bug. In our initial design we kinda implicitely assumed all NUMA nodes consistent and having CPU+memory. More complex and unequal scenarios are indeed possible.