Upgrading Kubernetes to 1.12.1

25th October 2018 0 By Jonny

Having got my kubernetes cluster up and running hosting the applications I’d wanted to run, I had kind of let it fall into a state of known neglect. I knew it was running – the applications (such as this very blog) continued to run and as far as I cared, it was running well.

Except for the annoying habit of a nodes randomly falling into an ‘NotReady’ state. This has been occurring ever since I’d upgraded the stack of boards running Kubernetes to comprise of 5 Asus Tinkerboards and 3 Raspberry Pi boards. I strongly suspect that being in close proximity to one another leads to localised overheating of boards on a random basis causing them to stop responding. I can check the state of the nodes by using the ‘kubectl get nodes -o wide’ command

kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pi-blue.kube.ipa.champion Ready 28d v1.11.3 192.168.11.241 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
pi-orange.kube.ipa.champion Ready 28d v1.11.3 192.168.11.245 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
pi-white.kube.ipa.champion NotReady 28d v1.11.3 192.168.11.244 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
tb-blue.kube.ipa.champion Ready master 28d v1.11.3 192.168.11.239 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-green.kube.ipa.champion Ready master 28d v1.11.3 192.168.11.238 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-orange.kube.ipa.champion Ready 28d v1.11.3 192.168.11.236 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-red.kube.ipa.champion Ready 28d v1.11.3 192.168.11.246 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-white.kube.ip.champion Ready 28d v1.11.3 192.168.11.237 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1

As can be seen from the above output, I have my pi-white.kube.ipa.champion node in a ‘NotReady’ state. The board still responds to ping requests, but SSH does not respond, which means I can’t remotely access the system and either reboot it or fix the issue. Strong suspicion that it’s a heat related issue, which I intend to try and fix using a USB fan – potentially performing regular temperature checks and acting accordingly.

Upgrading kubernetes

However, heat related hangs is not the subject of this post – that can be saved for an exciting future post I’m sure. As noted at the top of this post, I’d somewhat neglected my kubernetes cluster except for the occasional power cycle of nodes listed as NotReady. I received a notification that kubernetes 1.12.0 had been released and it had me thinking that I should find some time to run through an upgrade of my kubernetes cluster. Before I had time to perform this upgrade I then received an email that 1.12.1 had been released.  Software development really does move fast at the moment.

The online kubernetes repositories are very quick to provide packages for new releases, and I noted that 1.12.1 was available. Upgrading between versions is a bit more complicated than simply performing a ‘sudo apt-get update && sudo apt-get upgrade’. The kubernetes cluster needs to be instructed that a version upgrade is underway. I have also configured my kubernetes cluster to use two master nodes and six worker nodes using kubeadm to create the cluster. This is (presumably?) becoming a more common method of deploying clusters as kubeadm edges towards maturity, however there were no outright instructions advising how to perform the upgrade. In the end I decided to follow the instructions for a single master deployment.

It seems that the official documentation has since been updated to explain how to upgrade an HA cluster deployed with kubeadm (which mirrors my environment). I’ll check to see if there are any major differences or mistakes I’ve made.

Upgrade Process

The first step is to install the updated kubeadm on the master node. I have two master nodes, so I chose the master node that I set up first of all (tb-blue.kube.ipa.champion, verified the updated package was installed and then ran the ‘upgrade plan’.

sudo apt-get update && sudo apt-get upgrade -y kubelet kubeadm
sudo kubeadm version
sudo kubeadm upgrade plan

This final command ran through the pre-flight checks and produced a summary of the upgrade process which could be started by running ‘kubeadm upgrade apply 1.12.1’. This command runs through upgrading the kubernetes cluster components (images and cluster configuration) on the master node to 1.12.1 and, if successful, will print a message when complete. This process did not take long at all.

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.12.1". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

This looks very promising. The next step was to individually upgrade each node, first by ‘draining’ the node, then upgrading the kubelet and kubeadm packages on the node, upgrading the kubelet config (on worker nodes), and finally restarting the kubelet daemon on the node. I performed these steps on my second master node first of all (tb-green.kube.ipa.champion).

kubectl drain tb-green.kube.ipa.champion --ignore-daemonsets

sudo apt-get update && sudo apt-get upgrade -y kubelet kubeadm
sudo systemctl daemon-reload
sudo systemctl restart kubelet
sudo systemctl status kubelet

kubectl uncordon tb-green.kube.ipa.champion

The first and final commands above are run on my kubernetes admin workstation. The commands prefixed with ‘sudo’ are run on the node itself. I’m not certain the ‘sudo systemctl daemon-reload’ command is strictly necessary, however I like to run this where on-disk systemd files may have been altered.

Node Upgrades

Upgrading the individual nodes followed a very similar process. There was an extra step with the worker nodes. After the kubelet and kubeadm packages had been updated the kubelet configuration was upgraded with the following command:

sudo kubeadm upgrade node config --kubelet-version $(kubelet --version | cut -d ' ' -f 2)

Once the worker node was uncordoned, I moved on to the next worker node until all worker nodes had been updated. As far as I can tell, this process completed successfully, and my low volume cluster didn’t seem to suffer any outage either!

kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pi-blue.kube.ipa.champion Ready 29d v1.12.1 192.168.11.241 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
pi-orange.kube.ipa.champion Ready 29d v1.12.1 192.168.11.245 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
pi-white.kube.ipa.champion Ready 29d v1.12.1 192.168.11.244 Raspbian GNU/Linux 9 (stretch) 4.14.71-v7+ docker://18.6.1
tb-blue.kube.ipa.champion Ready master 29d v1.12.1 192.168.11.239 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-green.kube.ipa.champion Ready master 29d v1.12.1 192.168.11.238 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-orange.kube.ipa.champion Ready 29d v1.12.1 192.168.11.236 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-red.kube.ipa.champion Ready 29d v1.12.1 192.168.11.246 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1
tb-white.kube.ip.champion Ready 29d v1.12.1 192.168.11.237 Debian GNU/Linux 9 (stretch) 4.14.70-rockchip docker://18.6.1

As we can see, I’ve since rebooted my ‘NotReady’ node, and all nodes are reporting ‘Ready’ and running 1.12.1. Obviously for a much larger fleet of kubernetes nodes I’d want to automate this process as much as possible, and when it comes round to upgrading from 1.12.x to 1.13.y I’ll try and have some ansible playbooks in place to make this a smooth process.