Incorrect resource accounting for pods that are scheduled (allocated resources) but not Running
ktarplee opened this issue · comments
Here is an example (trimmed down slightly) of a pod that is not counted by kubectl-view-allocations but it is counted by the kube-scheduler as consuming nvidia.com/gpu resources (and other resources).
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2020-10-19T19:01:59Z"
name: test
namespace: data-ingest
spec:
containers:
- command:
- sh
- -exc
- echo hello
image: busybox:does-not-exist
imagePullPolicy: Always
name: msr
resources:
limits:
cpu: "128"
memory: 256Gi
nvidia.com/gpu: "16"
requests:
cpu: "1"
memory: 256Mi
nvidia.com/gpu: "16"
initContainers:
- command:
- sh
- -exc
- echo hello init
image: minio/mc
imagePullPolicy: Always
name: get-input-data
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 256m
memory: 256Mi
nodeName: dgx-1
priority: 100
priorityClassName: free
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-10-23T20:29:18Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-10-23T20:29:02Z"
message: 'containers with unready status: [msr]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-10-23T20:29:02Z"
message: 'containers with unready status: [msr]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-10-23T20:29:02Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: busybox:does-not-exist
imageID: ""
lastState: {}
name: msr
ready: false
restartCount: 0
started: false
state:
waiting:
message: Back-off pulling image "busybox:does-not-exist"
reason: ImagePullBackOff
hostIP: 10.1.4.5
initContainerStatuses:
- containerID: docker://cc4cc2745ae7a23d0f06d4879fab1b6207b301b44379186db0d172ded1af5956
image: minio/mc:latest
imageID: docker-pullable://minio/mc@sha256:ac82bb6219b60b662e28c6f0d642f36bbf7803fc74929c11319f4592203fa752
lastState: {}
name: get-input-data
ready: true
restartCount: 0
state:
terminated:
containerID: docker://cc4cc2745ae7a23d0f06d4879fab1b6207b301b44379186db0d172ded1af5956
exitCode: 0
finishedAt: "2020-10-23T20:29:18Z"
reason: Completed
startedAt: "2020-10-23T20:29:04Z"
phase: Pending
podIP: 10.42.15.92
podIPs:
- ip: 10.42.15.92
qosClass: Burstable
startTime: "2020-10-23T20:29:02Z"
It looks like the issue might be with this line
.and_then(|ps| ps.node_name.as_ref().map(|s| s == "Running"))
in src/main.rs:183
Also it looks like you look for running containers right after that. In my case the containers are not running (yet).
You only consider pods that have a phase of Running. It you want kubectl-view-allocations to provide truely what resoruces are available on a node with kubectl-view-allocations -g node -g resource
then we need to consider more then just running pods but Pending as well.
The condition we might want is that nodeName is set or that there is an entry in the status.conditions array that has type PodScheduled and status "True".
This can be recreated by applying this (just need a bad image name):
apiVersion: batch/v1
kind: Job
metadata:
name: kyle-test
spec:
template:
# This is the pod template
spec:
containers:
- name: main
image: nvidia/cuda:does-not-exist
args: ['sleep', 'infinity']
resources:
limits:
cpu: 1000m
memory: 1Gi
nvidia.com/gpu: 1
restartPolicy: Never