estafette / estafette-gke-preemptible-killer

Kubernetes controller to spread preemption for preemtible VMs in GKE to avoid mass deletion after 24 hours

Home Page:https://helm.estafette.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

node draining and volatility

davidquarles opened this issue · comments

Dumb question: Would it make sense to either (1) use the built-in drain functionality in the kubernetes client, which already respects PDBs or (2) evict pods, instead of deleting them, if custom logic is beneficial? I'm running a couple rather small clusters which are tightly bin-packed (though autoscalable), with several control loops that kill both pods and nodes, namely:

  • vertical-pod-autoscaler, in auto mode (mostly respects PDBs)
  • cluster-autoscaler (respects PDBs)
  • k8s-node-termination-handler (respects PDBs)
  • gke-preemptible-killer

I've been carefully tuning pod anti-affinity, priority, PDBs and readiness/liveness probes to ensure service availability while utilizing preemptible to keep cost low. Nothing critical is a singleton, and I'm about to deploy overprovisioning as an added layer of protection. As it stands, though, I still occasionally encounter a perfect storm with noticeable service impact that I believe would be mitigated by making this controller PDB-aware.

Tangentially: Perhaps we could also watch for a configurable set of taints (as both CA and the node-termination-handler taint nodes), and throttle deletion of nodes in the presence of other, unschedulable / currently draining nodes?

Does that make sense? I'm happy to contribute, if that's useful.

Any input on this from owners? I was about to deploy this when I realized it didn't use standardized node draining practices. I believe either @davidquarles or I would be happy to help implement this if there's interest / input from the owners.

Hey @davidquarles and @kinghrothgar we stopped using this controller a long time ago, due to having quite a lot of issues with running preemptibles at scale in a busy zone.

Combined with https://github.com/estafette/k8s-node-termination-handler - helm chart for https://github.com/GoogleCloudPlatform/k8s-node-termination-handler - and multi-region clusters this should be far less adventurous though. And definitely worthwhile for smaller clusters.

I'll try to address your questions one by one.

  1. client-go unfortunately doesn't have the drain functionality, but https://github.com/kubernetes/kubectl/tree/master/pkg/drain does. I'll look if I can either use that package or replicate it's logic. And although this controller can take more time to drain a node and take PodDisruptionBudgets into account, a real preemption will not do so and only give you 30 seconds to shut things down.

  2. I see client-go support Evict, but what's actually the difference? Does it make room on a new node first before stopping the container? According to https://stackoverflow.com/questions/62277852/whats-the-difference-between-pod-deletion-and-pod-eviction/62277900#62277900 it's preferable to delete instead of evict pods. Please explain the advantage of eviction.

After merging #90 we can look at tackling this issue.