node draining and volatility

Question

node draining and volatility

davidquarles opened this issue 5 years ago · comments

Dumb question: Would it make sense to either (1) use the built-in drain functionality in the kubernetes client, which already respects PDBs or (2) evict pods, instead of deleting them, if custom logic is beneficial? I'm running a couple rather small clusters which are tightly bin-packed (though autoscalable), with several control loops that kill both pods and nodes, namely:

vertical-pod-autoscaler, in auto mode (mostly respects PDBs)
cluster-autoscaler (respects PDBs)
k8s-node-termination-handler (respects PDBs)
gke-preemptible-killer

I've been carefully tuning pod anti-affinity, priority, PDBs and readiness/liveness probes to ensure service availability while utilizing preemptible to keep cost low. Nothing critical is a singleton, and I'm about to deploy overprovisioning as an added layer of protection. As it stands, though, I still occasionally encounter a perfect storm with noticeable service impact that I believe would be mitigated by making this controller PDB-aware.

Tangentially: Perhaps we could also watch for a configurable set of taints (as both CA and the node-termination-handler taint nodes), and throttle deletion of nodes in the presence of other, unschedulable / currently draining nodes?

Does that make sense? I'm happy to contribute, if that's useful.

Luke Jolly · Answer 1 · Fri Oct 02 2020 01:55:08 GMT+0800 (China Standard Time)

Any input on this from owners? I was about to deploy this when I realized it didn't use standardized node draining practices. I believe either @davidquarles or I would be happy to help implement this if there's interest / input from the owners.

Jorrit Salverda · Answer 2 · Mon Apr 26 2021 16:09:24 GMT+0800 (China Standard Time)

Hey @davidquarles and @kinghrothgar we stopped using this controller a long time ago, due to having quite a lot of issues with running preemptibles at scale in a busy zone.

Combined with https://github.com/estafette/k8s-node-termination-handler - helm chart for https://github.com/GoogleCloudPlatform/k8s-node-termination-handler - and multi-region clusters this should be far less adventurous though. And definitely worthwhile for smaller clusters.

I'll try to address your questions one by one.

client-go unfortunately doesn't have the drain functionality, but https://github.com/kubernetes/kubectl/tree/master/pkg/drain does. I'll look if I can either use that package or replicate it's logic. And although this controller can take more time to drain a node and take PodDisruptionBudgets into account, a real preemption will not do so and only give you 30 seconds to shut things down.
I see client-go support Evict, but what's actually the difference? Does it make room on a new node first before stopping the container? According to https://stackoverflow.com/questions/62277852/whats-the-difference-between-pod-deletion-and-pod-eviction/62277900#62277900 it's preferable to delete instead of evict pods. Please explain the advantage of eviction.

After merging #90 we can look at tackling this issue.