Kubernetes probes
Developers write great applications. But even the best applications are not always able to handle requests.
- If an application is just launched it might need to perform some initialization routines before it is ready to process requests.
- An application might be in a state where it can not continue due to external blocking factors such as database connections that have dropped, filesystems that have filled up etc.
- An application might be shutting down. This could be because of an update or of a scaling action.
Unless we help Kubernetes with readiness probes it will not know about the state of an application and assume that the app is up.
And at the same time even the best applications might run into situations that require the an app to restart. This could be the case if an application has problems recovering from a lost connection. Of if some internal routines generate impressive stacktraces that are unrecoverable.
For these situations we have liveness probes.
The readiness probe is also very important when doing rolling upgrades of applications. It will delay the removal of an old version of a pod as long as the new version is not ready.
Prequisites
In order to experience the behaviour of resource limits the following setup is expected:
- minikube running local with 4 Cores and 8GiB of memory. (
minikube start --memory=8G --cpus=4
) - minikube metrics-server enabled (
minikube addons enable metrics-server
) - network connectivity to download images
Create the minikube cluster if you haven't done it yet. Once it is started open a terminal and enter the command
kubectl get events --watch
to receive constant updates on what is happening inside the cluster.
Probe sequence
Liveness probes and readiness probes have been around for quite some time. They have one short coming and that is how to deal with slow starting pods. For slow starting pods the startup probe is introduced. It requires Kubernetes version 1.18 or above. The startup probe waits for a condition to occur and then passes on control to the readiness and liveness probes.
|
|
V
+---------+
| Startup |
| probe |
+---------+
|
|
/\
/ \
/ \
/ \
/ \
/ \
/ \
/ \
| |
+-------+ +--------+
| | | |
| V V |
| +----------+ +-----------+ |
| | Liveness | | Readiness | |
| | probe | | probe | |
| +----------+ +-----------+ |
| | | |
+-------+ +--------+
Services without (ready) pods
Let's prepare our environment a bit and experience what might happen if there is no pod to handle requests.
Deploy the service definition. This definition will listen on port 8080 and forward the traffic to one or more upstream pods.
❯ kubectl apply -f 01-service-for-webserver.yaml
service/apache created
Have a closer look at this service and notice the endpoints. Do this from another terminal:
❯ kubectl apply -f 01-service-for-webserver.yaml
service/apache created
❯ kubectl describe service apache
Name: apache
Namespace: default
Labels: app=apache
Annotations: <none>
Selector: app=apache
Type: LoadBalancer
IP Families: <none>
IP: 10.97.208.116
IPs: 10.97.208.116
Port: web 8080/TCP
TargetPort: 80/TCP
NodePort: web 31718/TCP
Endpoints: <none>
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Type 4m20s service-controller ClusterIP -> LoadBalancer
There are no endpoints active yet. You can check this by opening the url http://localhost:8080/ with a browser, postman, insomnia, curl or wget after opening a port forward. Note the ampersand at the end.
Some browsers will automatically redirect you from a http to a https session. If that happens try to navigate to http://127.0.0.1:8080/ instead.
# Port forward first
kubectl port-forward service/apache 8080 &
❯ curl <http://localhost:8080/>
curl: (7) Failed to connect to localhost port 8080: Connection refused
A deployment without probes
Deploy 3 pods that will receive traffic from the service that was deployed recently and check the status of the deployment and the service:
❯ kubectl apply -f 02-webserver-without-probes.yaml
deployment.apps/apache created
❯ kubectl describe deployments.apps apache
Name: apache
... omitted
Selector: app=apache
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
... omitted
❯ kubectl describe service apache
Name: apache
... omitted
Selector: app=apache
... omitted
Port: web 8080/TCP
TargetPort: 80/TCP
Endpoints: 172.17.0.2:80
Session Affinity: None
Events: <none>
Notice that the service shows 1 endpoint which matches the status of the replicas ( 1 available ).
Feel free to validate that it actually works, you might have to reopen the port forward again:
❯ curl http://localhost:8080/
Handling connection for 8080
<html><body><h1>It works!</h1></body></html>
Due to the way the port-forwarding works we will not rely on it for futher testing.
Cleanup:
❯ kubectl delete -f 02-webserver-without-probes.yaml
deployment.apps "apache" deleted
A deployment with probes
Again deploy 3 pods. This time with probes.
Before continuing enter the following command in a seperate terminal to see all the events, keep this running:
❯ kubectl get events --watch
LAST SEEN TYPE REASON OBJECT MESSAGE
10m Normal Scheduled pod/apache-598f4c9bf9-757z8 Successfully assigned default/apache-598f4c9bf9-757z8 to minikube
9m59s Normal Pulling pod/apache-598f4c9bf9-757z8 Pulling image "httpd"
9m56s Normal Pulled pod/apache-598f4c9bf9-757z8 Successfully pulled image "httpd" in 3.40699297s
9m56s Normal Created pod/apache-598f4c9bf9-757z8 Created container httpd
9m54s Normal Started pod/apache-598f4c9bf9-757z8 Started container httpd
3m23s Normal Killing pod/apache-598f4c9bf9-757z8 Stopping container httpd
42m Normal Killing pod/apache-598f4c9bf9-n5t26 Stopping container httpd
... omitted
Have a look at the service again. If there are endpoints the cleanup of the deployment is not completed yet.
❯ kubectl describe service apache
Name: apache
... omitted
TargetPort: 80/TCP
Endpoints: <none>
... omitted
The following steps are timing sensitive. If not performed quick enough the results can be different.
Next deploy pods with probes, observe that the service will not get any endpoints yet because the pods remain in a not ready state forever.
❯ kubectl apply -f 03-webserver-with-probes.yaml
deployment.apps/apache created
❯ kubectl describe service apache
Name: apache
... omitted
Endpoints:
... omitted
Also observe the events, there are many messages about failed Readiness probes and Liveness probes.
Let's make a pod ready. We need the name of a pod so let's create a list with names and pick the top one for starters.
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-c4fb9756-6bgmk 0/1 ContainerCreating 0 5s
apache-c4fb9756-bhqmt 0/1 ContainerCreating 0 5s
apache-c4fb9756-q8dtw 0/1 Running 0 5s
The pod name
apache-c4fb9756-6bgmk
in the following examples has to be replaced with the name shown on your screen.
Let's satisfy the Readiness Probe first:
❯ kubectl exec apache-c4fb9756-6bgmk -- touch /usr/local/apache2/htdocs/readiness
In the event log you should see a line like this: 61m Normal Type service/apache LoadBalancer -> ClusterIP
Let's check the status:
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-c4fb9756-6bgmk 1/1 Running 0 2m7s
apache-c4fb9756-bhqmt 0/1 Running 0 2m7s
apache-c4fb9756-q8dtw 0/1 Running 0 2m7s
❯ kubectl describe deployments.apps apache
Name: apache
... onmitted
Replicas: 3 desired | 3 updated | 3 total | 1 available | 2 unavailable
... omitted
❯ kubectl describe service apache
Name: apache
... omitted
Endpoints: 172.17.0.2:80
... omitted
If you wait a while you'll notice that the pods restart every 150 seconds and that after a restart the readiness probe of the first pod fails again.
After restarting the file that affects the readiness probes is gone causing the readiness probe to fail again.
Let's satisfy all the liveness probes, create a file in all three pods. You have to replace the podnames to match yours:
❯ kubectl exec apache-c4fb9756-6bgmk -- touch /usr/local/apache2/htdocs/liveness
❯ kubectl exec apache-c4fb9756-bhqmt -- touch /usr/local/apache2/htdocs/liveness
❯ kubectl exec apache-c4fb9756-q8dtw -- touch /usr/local/apache2/htdocs/liveness
If you experience problems it could be that there is a 'backoff timeout' active. You'll have to try again and again until your command is excepted.
By now the liveness probe is no longer failing. The event log will no longer report liveness probe related issues, but our servce is still not working:
❯ kubectl describe service apache
Name: apache
... omitted
TargetPort: 80/TCP
Endpoints:
... omitted
Let's fix it, one pod at a time:
❯ kubectl exec apache-c4fb9756-6bgmk -- touch /usr/local/apache2/htdocs/readiness
❯ kubectl describe service apache
Name: apache
... omitted
Endpoints: 172.17.0.2:80
... omitted
❯ kubectl exec apache-c4fb9756-bhqmt -- touch /usr/local/apache2/htdocs/readiness
❯ kubectl describe service apache
Name: apache
... omitted
Endpoints: 172.17.0.2:80,172.17.0.4:80
... omitted
That's two out of three...
Cleanup:
❯ kubectl delete -f 03-webserver-with-probes.yaml
deployment.apps "apache" deleted
❯ kubectl delete -f 01-service-for-webserver.yaml
service "apache" deleted
But what about deployments that have no (usable) http endpoint?
Let's use scripts... We deploy a pod which gets two script from a configMap. By default the pod will be ready and alive. But we can change this of course.
In a seperate terminal enter a command:
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
probes-76d848bdbb-4ddq2 1/1 Running 0 98s
In another trigger the readiness probes that is in the script:
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
probes-76d848bdbb-4ddq2 0/1 Running 5 9m20s
❯ kubectl exec probes-76d848bdbb-4ddq2 -- touch /tmp/notready
Your pod name is different. As you can see this pod is in a not ready state and will remain there for a very long time if we do not intervene.
Make the pod ready again:
❯ kubectl exec probes-76d848bdbb-4ddq2 -- rm /tmp/notready
Let's crash the pod:
kubectl exec probes-76d848bdbb-4ddq2 -- touch /tmp/notalive
Now it is a matter of time before the pod will restart.
Summary
There are several probes that play an important role in Kubernetes. One determines if a restart of a pod is required. The other one if the pod is able to perform its duties. These probes can be configured to use http requests or shell scripts to perform these tasks. The readiness probe also determines the pace of a rolling update. As soon as a pod reports itself as 'ready' the system will proceed with the next pod.