rancher / system-upgrade-controller

In your Kubernetes, upgrading your nodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proper handling of reboot scenarios / drain and cordon max. one node etc.

Martin-Weiss opened this issue · comments

Is your feature request related to a problem? Please describe.

We need to patch and reboot nodes in the cluster in a sequential fashion with ensuring that max. 1 master and 1 worker are "drained / cordoned / rebootet" in parallel. So we have to ensure that max. 1 node is not available during the process.

When looking at the example https://github.com/rancher/system-upgrade-controller/blob/master/examples/ubuntu/bionic/linux-kernel-virtual-hwe-18.04.yaml (as well as others that do a reboot of the node) - the problem is that SUC will start another node to be drained even before the first one is back in the cluster.

Describe the solution you'd like
Somehow we need an option in the drain / cordon process to "wait" until the last node updated/rebooted is back and healthy. This also needs to take into account that a node might still show "ready" even though it is rebooted, because Kubernetes might not realize a down / unavailable node for some time.

It would be great if we could specify in the "drain" for how long we wait until we run the job on the next node if the first one is completed and it would be great if we could specify "wait for drain until at least 90% of the nodes are available in ready and not cordoned status" and "no more than 1 node not-available / cordoned".

Describe alternatives you've considered
Using kured instead of system upgrade controller.

I suspect that the SUC considers the upgrade job complete as soon as the image exits successfully - as a 0 exit code is its sole success criteria.

If you want it to wait until after the reboot, you'll probably have to tweak your image so that it "fails" after triggering the reboot, and then "retries" and exits cleanly with a no-op after the reboot is complete.

The SUC itself doesn't know anything at all other than whether the job has succeeded. If the jobs succeeds before the reboot is complete, the SUC considers the work done and will move along to the next node. Any additional checks belong in your upgrade image, not in the SUC.