Readiness probe stops too early at eviction
hshiina opened this issue · comments
What happened?
When kubelet evicts a pod, the ready condition doesn’t get NotReady
during the pod termination even if a readinessProbe
is configured.
What did you expect to happen?
A readiness probe works during a pod termination so that the pod gets NotReady
as early as possible.
How can we reproduce it (as minimally and precisely as possible)?
Use this readiness.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: readiness-script-configmap
data:
readiness-script.sh: |
#!/bin/sh
handler() {
rm /tmp/ready
sleep 20
}
touch /tmp/ready
trap handler SIGTERM
while true; do
sleep 1
done
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: readiness-container
name: readiness-test
spec:
containers:
- command:
- sh
- /script/readiness-script.sh
image: busybox
name: readiness
readinessProbe:
exec:
command:
- cat
- /tmp/ready
initialDelaySeconds: 3
periodSeconds: 3
volumeMounts:
- name: readiness-script
mountPath: /script
- command:
- sh
- -c
- for i in `seq 100`; do dd if=/dev/random of=file${i} bs=1048576 count=1 2>/dev/null; sleep .1; done; while true; do sleep 5; done
name: disk-consumer
image: busybox
resources:
limits:
ephemeral-storage: "100Mi"
requests:
ephemeral-storage: "100Mi"
volumes:
- name: readiness-script
configMap:
name: readiness-script-configmap
Create resources and wait an eviction happens:
$ kubectl create -f readiness.yaml; kubectl get pods readiness-test -w
configmap/readiness-script-configmap created
pod/readiness-test created
NAME READY STATUS RESTARTS AGE
readiness-test 0/2 ContainerCreating 0 0s
readiness-test 1/2 Running 0 3s
readiness-test 2/2 Running 0 7s
readiness-test 0/2 Error 0 46s
readiness-test 0/2 Error 0 47s
When deleting this pod, the readiness probe works during termination:
$ kubectl create -f readiness.yaml; (sleep 15; kubectl delete pod readiness-test) & kubectl get pods -w
configmap/readiness-script-configmap created
pod/readiness-test created
[1] 137999
NAME READY STATUS RESTARTS AGE
readiness-test 0/2 ContainerCreating 0 0s
readiness-test 1/2 Running 0 2s
readiness-test 2/2 Running 0 6s
pod "readiness-test" deleted
readiness-test 2/2 Terminating 0 15s
readiness-test 1/2 Terminating 0 21s
readiness-test 0/2 Terminating 0 45s
readiness-test 0/2 Terminating 0 45s
readiness-test 0/2 Terminating 0 45s
readiness-test 0/2 Terminating 0 45s
Anything else we need to know?
I guess this issue is caused as follows:
At eviction, a pod phase is set to PodFailed
internally in podStatusFn
before stopping containers in the pod:
kubernetes/pkg/kubelet/kubelet.go
Lines 2026 to 2029 in 0d8f996
kubernetes/pkg/kubelet/eviction/eviction_manager.go
Lines 605 to 608 in 0d8f996
Because the internal pod phase is PodFailed
, the probe worker finishes working without probing containers at termination:
kubernetes/pkg/kubelet/prober/worker.go
Lines 203 to 216 in dd68c5f
Kubernetes version
$ kubectl version
# paste output here
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/sig node