kubeadm2ha - Workarounds for the time before kubeadm HA becomes available

A set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8. This code is intended to demonstrate and simplify creation of redundant-master setups while still using kubeadm which is still lacking this functionality. See kubernetes/kubeadm/issues/546 for discussion on this.

This code largely follows the instructions published in cookeem/kubeadm-ha and added only minor contribution in changing little bits for K8s 1.8 compatibility and automating things.

Overview

This repository contains a set of ansible scripts to do this. There are three playbooks:

cluster-setup.yaml sets up a complete cluster including the HA setup. See below for more details.
cluster-load-balanced.yaml sets up an NGINX load balancer for the apiserver.
cluster-uninstall.yaml removes data and configuration files to a point that cluster-setup.yaml can be used again.
cluster-dashboard.yaml sets up the dashboard including influxdb/grafana.
etcd-operator.yaml sets up the etcd-operator.
cluster-images.yaml prefetches all images needed for Kubernetes operations and transfers them to the target hosts.
local-access.yaml fetches a patched admin.conf file to /tmp/MY-CLUSTER-NAME-admin.conf. After copying it to ~/.kube/config remote kubectl access via V-IP / load balancer can be tested.
uninstall-dashboard.yaml removes the dashboard.
cluster-upgrade.yaml upgrades a cluster.

Prerequisites

Ansible version 2.4 or higher is required. Older versions will not work.

Configuration

In order to use the ansible scripts, at least two files need to be configured:

Either edit my-cluster.inventory or create your own. The inventory must define the following groups: primary-master (a single machine on which kubeadm will be run), secondary-masters (the other masters), masters (all masters), minions (the worker nodes), nodes (all nodes), etcd (all machines on which etcd is installed, usually the masters).
Either edit group_vars/my-cluster.yaml to your needs or create your own (named after the group defined in the inventory you want to use). Override settings from group_vars/all.yaml where necessary.

What the cluster setup does

Set up an etcd cluster with self-signed certificates on all hosts in group etcd..
Set up a keepalived cluster on all hosts in group masters.
Set up a master instance on the host in group primary-master using kubeadm.
Set up master instances on all hosts in group secondary-masters by copying and patching (replace the primary master's host name and IP) the configuration created by kubeadm and have them join the cluster.
Configure kube-proxy to use the V-IP / load balancer URL and configure kube-dns to the master nodes' cardinality.
Use kubeadm to join all hosts in the group minions.

What the additional playbooks can be used for:

Add an NGINX-based load-balancer to the cluster. After this, the apiserver will be available through the virtual-IP on port 8443. Note that this is a round-robin load balancer that will interfere with watch actions, like kubectl logs -f from a remote host (see #4).
Add etcd-operator for use with applications running in the cluster. This is an add-on purely because I happen to need it.
Pre-fetch and transfer Kubernetes images. This is useful for systems without Internet access.

What the images setup does

Pull all required images locally (hence you need to make sure to have docker installed on the host from which you run ansible).
Export the images to tar files.
Copy the tar files over to the target hosts.
Import the images from the tar files on the target hosts.

Setting up the dashboard

The cluster-dashboard.yaml playbook does the following:

Install the influxdb, grafana and dashboard components.
Scale the number of instances to the number of master nodes.
Expose the instances via NodePort, so that they can then be accessed through the V-IP.
Sets up a service account 'admin-user' and cluster role binding for the role 'cluster-admin' so that the dashboard can be accessed with root-like privileges.

For accessing the dashbord in this configuration there are two options:

Use https://:30443 - i.e. connect to the remote IP. You will get a certificat warning though because the cluster's certificates will be unknown to your browser.
Run kubectl proxy on your local host (which requires to have configured kubectl for your local host, see Configuring local access below for automating this), then access via http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

The dashboard will ask you to authenticate. Again, there are several options:

Use the token of an existing service account with sufficient privileges. On many clusters this command works for root-like access:

kubectl -n kube-system describe secrets `kubectl -n kube-system get secrets | awk '/clusterrole-aggregation-controller/ {print $1}'` | awk '/token:/ {print $2}'

Use the token of the 'admin-user' service account (if it exists):

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

Use the local-access.yaml playbook to generate a configuration file. That file can be copied to ~/.kube/config for local kubectl access. It can also be uploaded as kubeconfig file in the dashboard's login dialogue.

Configuring local access

Running the local-access.yaml playbook creates a file /tmp/-admin.conf that can be used as ~/.kube/config. If the dashboard has been installed (see above) the file will contain the 'admin-user' service account's token, so that for both kubectl and the dashboard root-like access is possible. If that service account does not exist, the client-side certificate will be used instead which is OK for testing environments but is generally considered not recommendable because the client-side certificates are not supposed to leave their master host.

Upgrading a cluster

For upgrading a cluster several steps are needed:

Find out which software versions to upgrade to.
Set the ansible variables to the new software versions.
Run the cluster-images.yaml playbook if the cluster has no Internet access.
Run the cluster-upgrade.yaml playbook.

Note: Never upgrade a productive cluster without having tried it on a reference system before.

Preparation

To find out which software versions to upgrade to you will need to run a more recent version of kubeadm:

export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt) # or manually specify a released Kubernetes version
export ARCH=amd64 # or: arm, arm64, ppc64le, s390x
curl -sSL https://dl.k8s.io/release/${VERSION}/bin/linux/${ARCH}/kubeadm > /tmp/kubeadm
chmod a+rx /tmp/kubeadm

Copy this file to /tmp on your primary master if necessary. Now run this command for checking prerequisites and determining the versions you'd get:

/tmp/kubeadm upgrade plan

If the prerequisites are met you'll get a summary of the software versions kubeadm would upgrade to, like this:

Upgrade to the latest stable version:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.8.3    v1.9.2
Controller Manager   v1.8.3    v1.9.2
Scheduler            v1.8.3    v1.9.2
Kube Proxy           v1.8.3    v1.9.2
Kube DNS             1.14.5    1.14.7
Etcd                 3.2.7     3.1.11

Note that upgrading etcd is not supported here because we are running it externally, hence we'll have to upgrade it according to etcd's upgrade instruction which is beyond scope here.

We will always use the same version for the Kubernetes base software installed on your OS (kubelet, kubectl, kubeadm) and the self-hosted core components (API Server, Controller Manager, Scheduler, Kube Proxy). Hence the "v1.9.2" listed in the kubeadm output will go into the KUBERNETES_VERSION ansible variable. Edit either group_vars/all.yaml to change this globally or group_vars/.yaml for your environment only. The same applies for the Kube DNS version which corresponds with the KUBERNETES_DNS_VERSION ansible variable.

Having configured this you may now want to fetch and install the new images for your to-be-upgraded cluster, if your cluster has no internet access. If it has you may want to do this anyway to make the upgrade more seamless.

To do so, run the following command:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory cluster-images.yaml

I usually set the number of concurrent processes manually because if a cluster consists of more than 5 (default) nodes picking a higher value here significantly speeds up the process.

Perform the upgrade

You may want to backup /etc/kubernetes on all your master machines. Do this before running the upgrade.

The actual upgrade is automated. Run the following command:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory cluster-upgrade.yaml

See the comment above on setting the number of concurrent processes.

The upgrade is not fully free of disruptions:

while kubeadm applies the changes on a master, it restarts a number of services, hence they may be unavailable for a short time
if containers running on the minions keep local data they have to take care to rebuild it when relocated to different minions during the upgrade process (i.e. local data is ignored)

If any of these is unacceptable, a fully automated upgrade process does not really make any sense because deep knowledge of the application running in a respective cluster is required to work around this. Hence in that case a manual upgrade process is recommended.

If you are using the NGINX load balancer

After the upgrade the NGINX load balancer will not be in use. To reenable it, simply rerun the cluster-load-balanced.yaml playbook.

If something goes wrong

If the upgrade fails the situation afterwards depends on the phase in which things went wrong.

If kubeadm failed to upgrade the cluster it will try to perform a rollback. Hence if that happened on the first master, chances are pretty good that the cluster is still intact. In that case all you need is to start docker, kubelet and keepalived on the secondary masters and then uncordon them (kubectl uncordon <secondary-master-fqdn>) to be back where you started from.

If kubeadm on one of the secondary masters failed you still have a working, upgraded cluster, but without the secondary masters in a somewhat undefined condition. In some cases kubeadm fails if the cluster is still busy after having upgraded the previous master node, so that waiting a bit and running kubeadm upgrade apply v<VERSION> may even succeed. Otherwise you will have to find out what went wrong and join the secondaries manually. Once this has been done, finish the automatic upgrade process by processing the second half of the playbook only:

ansible-playbook -f <good-number-of-concurrent-processes> -i <your-environment>.inventory cluster-upgrade.yaml --tags nodes

If upgrading the software packages (i.e. the second half of the playbook) failed, you still have a working cluster. You may try to fix the problems and continue manually. See the .yaml files under roles/upgrade-nodes/tasks for what you need to do.

If you are trying out the upgrade on a reference system, you may have to downgrade at some point to start again. See the sequence for reinstalling a cluster below for an instruction how to do this (hint: it is important to erase the some base software packages before setting up a new cluster based on a lower Kubernetes version).

Examples

To run one of the playbooks (e.g. to set up a cluster), run ansible like this:

ansible-playbook -i <your-inventory-file>.inventory cluster-setup.yaml

You might want to adapt the number of parallel processes to your number of hosts using the `-f' option.

A sane sequence of playbooks for a complete setup would be:

cluster-setup.yaml
etcd-operator.yaml
cluster-dashboard.yaml
cluster-load-balanced.yaml

The following playbooks can be used as needed:

cluster-uninstall.yaml
local-access.yaml
uninstall-dashboard.yaml

Sequence for reinstalling a cluster:

INVENTORY=<your-inventory-file> 
NODES=<number-of-nodes>
ansible-playbook -f $NODES -i $INVENTORY cluster-uninstall.yaml 
sleep 3m
# if you want to downgrade your kubelet, kubectl, ... packages you need to uninstall them first
# if this is not the issue here, you can skip the following line
ansible -u root -f $NODES -i $INVENTORY nodes -m command -a "rpm -e kubelet kubectl kubeadm kubernetes-cni"
for i in cluster-setup.yaml etcd-operator.yaml cluster-dashboard.yaml ; do 
    ansible-playbook -f $NODES -i $INVENTORY $i || break
    sleep 15s
done

Known limitations

This is a preview in order to obtain early feedback. It is not done yet. Known limitations are:

There could be more error checking.
The code has been tested almost exclusively in a Redhat-like (RHEL) environment. More testing on other distros is needed.

Why is there no release yet?

Currently the code is in a "works for me" state. In order to make a release more feedback from others is needed. I still expect some more bugs to be reported and fixed thereafter. Once this phase has ended there will be a first release.

ReSearchITEng / kubeadm2ha