nodes do not get deleted

Question

nodes do not get deleted

JorritSalverda opened this issue 7 years ago · comments

In one of our Kubernetes Engine clusters nodes that should be deleted do not get removed properly. They're already disabled for scheduling and the pods are evicted, but then the following error is logged when the controller tries to delete the vm:

{
	"time":"2017-11-20T09:23:10Z",
	"severity":"error",
	"app":"estafette-gke-preemptible-killer",
	"version":"1.0.29",
	"error":"Delete https://www.googleapis.com/compute/v1/projects/***/zones/europe-west1-c/instances/gke-development-euro-auto-scaling-pre-33198d65-gq2m?alt=json: dial tcp: i/o timeout",
	"host":"gke-development-euro-auto-scaling-pre-33198d65-gq2m",
	"message":"Error while processing node"
}

Etienne · Answer 1 · Mon Nov 20 2017 23:14:08 GMT+0800 (China Standard Time)

can this be a timeout on GCloud side? I wasn't able to see any outage during that period, but if this node doesn't get processed, it should be on the next loop and if the error still persist, maybe there is more information from the logs right before this happen

Kent Sutherland · Answer 2 · Mon Oct 22 2018 02:19:01 GMT+0800 (China Standard Time)

I'm seeing something similar happen. My guess is this might be happening because kube-dns is being killed before the GCloud client is used, so it fails to resolve the host name when authenticating.

 jsonPayload: {
  app: "estafette-gke-preemptible-killer"
  error: "Delete https://www.googleapis.com/compute/v1/projects/path/to/instance?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com on 10.114.0.10:53: dial udp 10.114.0.10:53: connect: network is unreachable"
  host: "test-pool-cb8bed09-17s6"
  message: "Error deleting GCloud instance"
  version: "1.0.35"
 }

Jorrit Salverda · Answer 3 · Mon Oct 22 2018 23:00:11 GMT+0800 (China Standard Time)

Although kube-dns - if present on the node - is actively deleted by https://github.com/estafette/estafette-gke-preemptible-killer/blob/master/main.go#L296 since kube-dns is running HA this shouldn't be an issue.

However it does turn out that kubernetes engine - built to be resilient - isn't very resilient in the light of preemptions. The master doesn't update services with pods on a preempted node fast enough to no longer send traffic there. We've seen this by having frequent kube-dns issues correlating with real preemptions by Google, not the ones issued by our preemptible-killer.

jstephens7 · Answer 4 · Wed Dec 05 2018 03:41:23 GMT+0800 (China Standard Time)

@JorritSalverda We're getting dns errors intermittently on our GKE preemptibles (with preemptible killer) with services in the cluster trying to resolve other services in the same cluster.
EDIT: It should be noted that we're only having these intermittent connection issues with our preemptibles, the other nodes are having no issues.
I'm asking out of ignorance:
What is the purpose of removing kube-dns on the node?
Would leaving kube-dns on the node remove the dns issues?
And could you clarify your last statement: "We've seen this by having frequent kube-dns issues correlating with real preemptions by Google, not the ones issued by our preemptible-killer."

Jorrit Salverda · Answer 5 · Wed Dec 05 2018 17:37:48 GMT+0800 (China Standard Time)

@jstephens7 we've seen the same and actually moved away from preemptibles for the time being. It's unrelated to this controller, but happens when a node really gets preempted by Google before this controller would do it instead. GKE doesn't handle preemption gracefully, but just kill the node at once. This leaves the Kubernetes master in the blind for a while until it discovers that the node is no longer available. In the mean time the iptables don't get updated and traffic still gets routed to the unavailable node. I would expect this scenario to be handled better, since you want Kubernetes to be resilient in the face of real node malfunction.

For AWS there's actually a notifier that warns you a spot instance is going down, but GCP doesn't have such a thing currently. See https://learnk8s.io/blog/kubernetes-spot-instances for more info.

theallseingeye · Answer 6 · Tue Feb 26 2019 18:47:50 GMT+0800 (China Standard Time)

Seems like this could be a good solution https://github.com/GoogleCloudPlatform/k8s-node-termination-handler

Tim Mirecki · Answer 7 · Sun Jul 21 2019 02:00:17 GMT+0800 (China Standard Time)

@JorritSalverda have you completely given up on preemptibles in production (because of this issue)? Just exploring the idea so would love to hear your feedback.

And would @theallseingeye's suggestion mitigate this?

Santiago Nuñez-Cacho · Answer 8 · Sat Oct 17 2020 01:18:05 GMT+0800 (China Standard Time)

When deleting node, I am experiencing this error

INF Done draining kube-dns from node host=gke-xxxxx
ERR Error deleting GCloud instance error="Delete "https://www.googleapis.com/compute/v1/projects/yyyyyy/zones/europe-west1-b/instances/gke-xxxxxx?alt=json\": oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token\": x509: certificate signed by unknown authority" host=gke-xxxxx
ERR Error while processing node error="Delete "https://www.googleapis.com/compute/v1/projects/yyyyyy/zones/europe-west1-b/instances/gke-xxxxxx?alt=json\": oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token\": x509: certificate signed by unknown authority" host=gke-xxxx

I would say that my serviceaccount json is well upload to the pod , and the account has the proper permissions..so I dont know what is happening

Jorrit Salverda · Answer 9 · Mon Oct 19 2020 15:47:43 GMT+0800 (China Standard Time)

Hi @santinoncs, do you use the Helm chart? And what version? We run it with a service account with roles compute.instanceAdmin.v1 on the project the GKE cluster is in. That seems to work fine.

Jorrit Salverda · Answer 10 · Mon Oct 19 2020 15:58:30 GMT+0800 (China Standard Time)

Hi @tmirks we did abandon preemptibles for a while since the pressure on europe-west1 mounted and preemptions became more commonplace. The fact that GKE wasn't aware of preemptions caused a lot of trouble with kube-dns requests getting sent to no longer existing pods. Now we're testing the k8s-node-termination-handler - see Helm chart at https://github.com/estafette/k8s-node-termination-handler - with this application to ensure both GKE is aware of preemptions and preemptions are less likely to happen all at once. Spreading preemptible nodes across zones should also help in reduce changes on mass preemptions.

Santiago Nuñez-Cacho · Answer 11 · Mon Oct 19 2020 17:12:06 GMT+0800 (China Standard Time)

Hi @santinoncs, do you use the Helm chart? And what version? We run it with a service account with roles compute.instanceAdmin.v1 on the project the GKE cluster is in. That seems to work fine.

Already working when I copy the ca-certificates file to the container.

Viktor Stanchev · Answer 12 · Tue Sep 07 2021 22:35:20 GMT+0800 (China Standard Time)

Just FYI, GKE now handles node preemption gracefully, giving pods about 25 seconds to shut down.