Constantly breaking things

31st March 2020 0 By Jonny

At least that’s what the better half believes I’m constantly doing – I like to think I’m constantly improving things (and learning along the way of course). What’s broken/improved this time?

Rancher out, kubespray in

I’d replaced my manually deployed k8s cluster with a Rancher cluster, as the hassle of constantly updating the k8s components as they were released became a bit boring (and prone to break things … haha). Rancher was (and is!) great for an easy to use and manage k8s experience. In a real environment I’d be tempted to use it as once deployed it can generally be quietly forgotten about and left to run itself, with a relatively decent upgrade process in place. I’d also ended up with 2 small-ish clusters built at home, which were easy to manage via the GUI – although I did have a couple of incidents where I thought I was working on one cluster but was actually working on the other.

My infrastructure was also hosted on a somewhat motley collection of 4 old Intel NUCs of various specifications (you know they’re old, when the website doesn’t even include a picture any more), 1 x MSI small form factor PC, and a fanless Zotac PC running KVM. The k8s nodes were VMs on this collection of PCs, with the images hosted on NFS NAS boxes. Overall, it was ‘ok’ but not great. NFS contention was a bit of a problem, and the VMs were fairly slow. The VMs also had a maximum of 4GB RAM per instance. In a couple of bold cases I had 4 such VMs running on a NUC with only 16GB of RAM. The balloon driver and VM shared memory would sort that out without a problem, right?

And I got itchy fingers wanting to test a few more things … so, first things first, it’s time for an infrastructure upgrade. At first I really liked the idea of more fanless PCs – the Zotac was also equipped with AMT/vPro for remote administration … although sometimes this doesn’t seem to work so well, normally when I need it most. There’s probably a BIOS upgrade I need to apply.

New Infrastructure

The Zotac and fanless plan were dropped. It’s really difficult to get small form factor PCs with Intel AMT (apparently it’s an enterprise feature …), and whilst fanless is nice and quiet, they run hot (shocking, I know!) and given I’m not running a temperature controlled datacentre here, therefore prone to thermal throttling.

Browsing the computer centre I was talked into giving the newest model of Intel NUC a try. Sporting a 6 core 10th generation Intel i7 CPU, with an M2 slot for storage, and capable of up to 64GB of RAM, this sounded like the perfect VM host system. They do have fan cooling, but that was expected. Kitted out with 64GB of RAM, I was imagining VMs with 16GB of RAM! Testing with one system rapidly became deploying three of them to replace the 4 older NUCs and the one MSI (which had a constantly noisy fan). The Zotac has been retained – as it can host some VMs and is also running as my Plex server.

To say, I like the newer, updated NUCs is an understatement. They are performing wonderfully, hosting 3 x VMs with 16GB of RAM, and a further VM with 8GB of RAM (the master k8s nodes).

Using kubespray

As noted above, Rancher was out as I wanted to get a bit more hands on, but didn’t want to go back to manually deploying k8s and using kubeadm. There weren’t a huge number of choices – I considered OKD, but it was languishing on version 3 and I want to be current(ish) and wouldn’t want to rebuild again when version 4 was released. I also felt OKD was a bit overblown for what I need, and potentially quite resource hungry. I do, after all, want to run workloads on my cluster. I also looked at kops, but it seems to target AWS, and then looked at openSUSE for their options (I figured it would be good as SUSE have [wisely] ditched OpenStack to focus on k8s). It might be harsh, but currently kubic looks a bit rough and ready, and I don’t want to have to do everything myself.

I settled for kubespray in the end, which is an ansible powered k8s deployer for cloud and baremetal, and seems to cover off most of what I wanted. Having retained a soft sport for SUSE since working there, I (maybe unwisely) opted to use openSUSE as my base OS. I was going to use the transactional server deployment, but I had a few problems (of my own making and lack of knowledge) with trying to set python versions and file locations, that in the end I went with the traditional server deployment (stick with what you know!). This has also meant using docker as the container engine. At some point I might look to use CRI-O or containerd, but maybe not.

The KVM hosts on the shiny new NUCs also ran openSUSE, however due to the network card being so bang up to date in the system openSUSE Leap would not work, and they’re having run openSUSE Tumbleweed. Hopefully come openSUSE Leap 15.2 these can be rebuilt to use Leap.

$ kubectl get nodes -o wide
NAME          STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master1   Ready    master   54d   v1.16.8   192.168.11.83   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-master2   Ready    master   14d   v1.16.8   192.168.11.87   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-master3   Ready    master   54d   v1.16.8   192.168.11.89   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker1   Ready    <none>   54d   v1.16.8   192.168.11.84   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker2   Ready    <none>   54d   v1.16.8   192.168.11.85   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker3   Ready    <none>   54d   v1.16.8   192.168.11.86   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.40-default   docker://19.3.5
k8s-worker4   Ready    <none>   54d   v1.16.8   192.168.11.88   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker5   Ready    <none>   54d   v1.16.8   192.168.11.90   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker6   Ready    <none>   53d   v1.16.8   192.168.11.30   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker7   Ready    <none>   53d   v1.16.8   192.168.11.29   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker8   Ready    <none>   48d   v1.16.8   192.168.11.28   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5
k8s-worker9   Ready    <none>   47d   v1.16.8   192.168.11.27   <none>        openSUSE Leap 15.1   4.12.14-lp151.28.36-default   docker://19.3.5

Twelve nodes in total – the Zotac worker nodes have lower memory than the worker nodes on the Intel NUCs, but otherwise they’re configured similarly (the CPU on the Intel NUCs is also better than that on the Zotac).

Considerations/Drawbacks

Deploying from kubespray means that I’ll need to kubespray updated. It’s all hosted on a git repo, and I generally follow specific release branches, and then switch when there is a worthwhile upgrade. It’s early days, but the upgrade from k8s 1.16.6 to 1.16.8 went smoothly enough. The next upgrade will potentially be to 1.17 which might be more challenging.

It is possible to upgrade individual components within k8s via kubespray, but I haven’t (and most likely won’t) tried that yet.

I’ve selected to use calico as the CNI provider. When first starting out with k8s I always found it confusing to choose a CNI, there were so many choices. Initially I usually went with flannel, and then switched to weave. However, with work, I’m using GKE a lot (which I think is great), and it uses calico as the CNI, so it made sense to go with this as my CNI. So far no problems, and I am tempted by the idea of upgrading calico to the latest version to try and eliminate kube-proxy. So much for my statement above …

Deploying kubespray is also straighforward, once the inventory and environment files are created. A python virtual environment is also useful for enabling a straightforward deployment as there are some python dependencies. A deployment to my 12 node cluster can take the form of a one liner:

ansible-playbook -i inventory/homeCluster/hosts.yaml  --become cluster.yml

Upgrading a cluster can be a simple one-liner as well.

ansible-playbook upgrade-cluster.yml -b -i inventory/homeCluster/hosts.yaml -e kube_version=v1.16.8

Removing and adding a node to the cluster is also straightforward. In the next exciting segment, I might even get around to adding some software to run on my sparkling new cluster!