node draining and volatility
davidquarles opened this issue · comments
Dumb question: Would it make sense to either (1) use the built-in drain functionality in the kubernetes client, which already respects PDBs or (2) evict pods, instead of deleting them, if custom logic is beneficial? I'm running a couple rather small clusters which are tightly bin-packed (though autoscalable), with several control loops that kill both pods and nodes, namely:
- vertical-pod-autoscaler, in auto mode (mostly respects PDBs)
- cluster-autoscaler (respects PDBs)
- k8s-node-termination-handler (respects PDBs)
- gke-preemptible-killer
I've been carefully tuning pod anti-affinity, priority, PDBs and readiness/liveness probes to ensure service availability while utilizing preemptible to keep cost low. Nothing critical is a singleton, and I'm about to deploy overprovisioning as an added layer of protection. As it stands, though, I still occasionally encounter a perfect storm with noticeable service impact that I believe would be mitigated by making this controller PDB-aware.
Tangentially: Perhaps we could also watch for a configurable set of taints (as both CA and the node-termination-handler taint nodes), and throttle deletion of nodes in the presence of other, unschedulable / currently draining nodes?
Does that make sense? I'm happy to contribute, if that's useful.
Any input on this from owners? I was about to deploy this when I realized it didn't use standardized node draining practices. I believe either @davidquarles or I would be happy to help implement this if there's interest / input from the owners.
Hey @davidquarles and @kinghrothgar we stopped using this controller a long time ago, due to having quite a lot of issues with running preemptibles at scale in a busy zone.
Combined with https://github.com/estafette/k8s-node-termination-handler - helm chart for https://github.com/GoogleCloudPlatform/k8s-node-termination-handler - and multi-region clusters this should be far less adventurous though. And definitely worthwhile for smaller clusters.
I'll try to address your questions one by one.
-
client-go unfortunately doesn't have the drain functionality, but https://github.com/kubernetes/kubectl/tree/master/pkg/drain does. I'll look if I can either use that package or replicate it's logic. And although this controller can take more time to drain a node and take PodDisruptionBudgets into account, a real preemption will not do so and only give you 30 seconds to shut things down.
-
I see client-go support Evict, but what's actually the difference? Does it make room on a new node first before stopping the container? According to https://stackoverflow.com/questions/62277852/whats-the-difference-between-pod-deletion-and-pod-eviction/62277900#62277900 it's preferable to delete instead of evict pods. Please explain the advantage of eviction.
After merging #90 we can look at tackling this issue.