Deploying flannel, vSphere CPI and vSphere CSI with later versions of Kubernetes

I recently wanted to deploy a newer versions of Kubernetes to see it working with our Cloud Native Storage (CNS) feature. Having assisted with the original landing pages for CPI and CSI, I’d done this a few times in the past. However, the deployment tutorial that we used back then was based on Kubernetes version 1.14.2. I wanted to go with a more recent build of K8s, e.g. 1.16.3. By the way, if you are unclear about the purposes of the CPI and CSI, you can learn more about them on the landing page, here for CPI and here for CSI.

OK, before we begin I do want to make it clear that the instructions are still completely valid. The only issue is that with the later releases of K8s, some of the “Kinds” (K8s components) have changed. This will become clear as we go through the process. The other thing that has caught me out personally is the requirement to use hard-coded names for the configurations files, both for the CPI and the CSI. I’ll show you how issues manifest themselves when these hard-codes names are not used.

I am going to continue to use the Flannel for my CNI. However, there are a number of modifications required to the flannel YAML. I’ll highlight these to you as we go along.

1. Changes to K8s Master/Control Plane Deployment

The obvious change here is that we need to install K8s tool that is v1.16.3 rather than v1.14.2. This is quite straight forward to do. For the tools, change the apt install to the correct versions:

# apt install -qy kubeadm=1.16.3-00 kubelet=1.16.3-00 kubectl=1.16.3-00

For the correct K8s distribution, modify the master’s kubeadminit.yaml,as shown here:

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
useHyperKubeImage: false
kubernetesVersion: v1.16.3

After deploying the K8s control plane/master as per the tutorial, you will see the following message displayed:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Until a network has been deployed, the control plane/master node stays in a NotReady state.

root@k8s-master-01:~# kubectl get nodes
NAME            STATUS     ROLES    AGE   VERSION
k8s-master-01   NotReady   master   66s   v1.16.3

You can see the reason for the NotReady via a kubectl describe of the node:

root@k8s-master-01:~# kubectl describe node k8s-master-01
Name:               k8s-master-01
Roles:              master
...

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 06 Mar 2020 09:04:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 06 Mar 2020 09:04:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 06 Mar 2020 09:04:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Fri, 06 Mar 2020 09:04:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false \
reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

We need to provide a CNI (Container Network Interface), and for the purposes of the tutorial, we have been using flannel. However, with us using a newer version of Kubernetes, this is where things start to get interesting.

2. Changes to Flannel deployment in K8s 1.16.3

Let’s do the very first step from the tutorial and see what happens when we apply the kube-flannel yaml:

root@k8s-master-01:~# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "DaemonSet" in version "extensions/v1beta1"
unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \
no matches for kind "DaemonSet" in version "extensions/v1beta1"

root@k8s-master-01:~#

The errors above are as a result of API deprecation in Kubernetes 1.16. PodSecurityPolicy is now is the policy/v1beta API and DaemonSet is now in apps/v1 API. After downloading and making the appropriate changes to the kube-flannel.yaml, I ran it again.

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg unchanged
error: error validating "kube-flannel-test.yaml": error validating data: \
ValidationError(DaemonSet.spec): missing required field "selector" \
in io.k8s.api.apps.v1.DaemonSetSpec; if you choose to ignore these errors, turn validation off with --validate=false

root@k8s-master-01:~#

This error is as a result of changes made to DaemonSet. Because DaemonSet has been updated to use apps/v1 instead of extensions/v1beta1, the apps/v1 API version requires a selector to be provided in the DaemonSet spec. So once again, I modified the flannel YAML file to include a selector in the DaemonSet spec as follows:

Before:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds-amd64
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel

After:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds-arm
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      name: kube-flannel
  template:
    metadata:
      labels:
        name: kube-flannel
        tier: node
        app: flannel

After making that changed, I deployed my flannel yaml once more. Success!

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml
podsecuritypolicy.policy/psp.flannel.unprivileged configured
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg unchanged
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created

root@k8s-master-01:~#

At least, I thought it was success. However, when I checked on my master node, it still wasn’t ready.

root@k8s-master-01:~# kubectl get nodes
NAME            STATUS     ROLES    AGE   VERSION
k8s-master-01   NotReady   master   23m   v1.16.3


root@k8s-master-01:~# kubectl describe nodes k8s-master-01
Name:               k8s-master-01
Roles:              master
...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 06 Mar 2020 09:24:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 06 Mar 2020 09:24:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 06 Mar 2020 09:24:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Fri, 06 Mar 2020 09:24:37 +0000   Fri, 06 Mar 2020 09:01:33 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false \
reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Hmm. Still not ready. It was time to look at the node’s kubelets logs and see if there is something else wrong. This is what I found:

root@k8s-master-01:~# journalctl -xe | grep kubelet
...
Mar 06 09:25:54 k8s-master-01 kubelet[19505]: W0306 09:25:54.953423   19505 cni.go:202] \
Error validating CNI config &{cbr0  false [0xc0003a53e0 0xc0003a5780]...

After a quick search, I found an issue reported on github. It was entitled network not ready after `kubectl apply -f kube-flannel.yaml` in v1.16 cluster #1178 The solution is to add a cniVersion number to the flannel yaml file. So the following entry was added to my flannel yaml:

Before:

data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",

After:

data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion”: "0.3.1”,
      "plugins": [
        {
          "type": "flannel",

Reapply the flannel yaml once more, and check the status of my master node. Finally – its Ready!

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml
podsecuritypolicy.policy/psp.flannel.unprivileged configured
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg configured
daemonset.apps/kube-flannel-ds-amd64 unchanged
daemonset.apps/kube-flannel-ds-arm64 unchanged
daemonset.apps/kube-flannel-ds-arm unchanged
daemonset.apps/kube-flannel-ds-ppc64le unchanged
daemonset.apps/kube-flannel-ds-s390x unchanged
root@k8s-master-01:~#

root@k8s-master-01:~# kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
k8s-master-01   Ready    master   35m   v1.16.3

root@k8s-master-01:~#

OK – we can now get on with deploying the worker nodes. Of course, if you didn’t want to mess about with all this flannel related stuff, you could of course choose another pod network addon, such as Calico. If you want the new flannel yaml manifest, you can download it with all the changes from here.

3. Changes to Worker Node Deployments

There is very little change required here. You must simply make sure that you deploy the newer kubectl, kubeadm and kubelet versions – 1.1.6.3 rather than 1.14.2. We already seen how to do that for the master node in step 1 – repeat this for the workers. I added to workers to my cluster:

root@k8s-master-01:~# kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
k8s-master-01   Ready    master   39m   v1.16.3
k8s-worker-01   Ready    <none>   41s   v1.16.3
k8s-worker-02   Ready    <none>   20s   v1.16.3

root@k8s-master-01:~#

The remainder of the deployment is pretty much the same as before. As you go about deploying the CPI (Cloud Provider Interface) and the CSI (Container Storage Interface), some of the manifests reference  sub-folder called 1.14. It is ok to continue using these manifests, even for later versions (1.16.3) of K8s.

4. A word about CPI and CSI configuration files

The remaining steps involve deploying the CPI and CSI drivers so that you can allow your Kubernetes cluster to consume vSphere storage, and have the usage bubbled up in the vSphere Client. However, something that has caught numerous people out, inclusing myself, is that the CPI and CSI configuration files are hard-coded; for CPI you must use a configuration file called vsphere.conf and for CSI you must use a configuration file called csi-vsphere.conf. I did a quick exercise to show the sorts of failures you would expect to see if different configuraton filenames are used.

4.1 CPI failure scenario

After deploying the CPI yaml manifests, the first thing you would check is to make sure the cloud-controller-manager pod deployed successfully. You wil see something like this if it did not:

root@k8s-master-01:~# kubectl get pods -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
..
kube-apiserver-k8s-master-01               1/1     Running            0          13m
kube-controller-manager-k8s-master-01      1/1     Running            0          13m
kube-proxy-58pkq                           1/1     Running            0          9m10s
kube-proxy-gqzpc                           1/1     Running            0          13m
kube-proxy-qx4rd                           1/1     Running            0          7m7s
kube-scheduler-k8s-master-01               1/1     Running            0          13m
vsphere-cloud-controller-manager-4792t     0/1     CrashLoopBackOff   5          5m50s

Let’s look at the logs from the pod. I’ve just cut a few snippets out of the complete log output:

root@k8s-master-01:~# kubectl logs vsphere-cloud-controller-manager-4792t -n kube-system
I0305 13:54:17.317921       1 flags.go:33] FLAG: --address="0.0.0.0"
I0305 13:54:17.318366       1 flags.go:33] FLAG: --allocate-node-cidrs="false"
I0305 13:54:17.318383       1 flags.go:33] FLAG: --allow-untagged-cloud="false"
...
I0305 13:54:17.318492       1 flags.go:33] FLAG: --cloud-config="/etc/cloud/vsphere.conf"
I0305 13:54:17.318496       1 flags.go:33] FLAG: --cloud-provider="vsphere"
...
F0305 13:54:18.517986       1 plugins.go:128] Couldn't open cloud provider configuration \
/etc/cloud/vsphere.conf: &os.PathError{Op:"open", Path:"/etc/cloud/vsphere.conf", Err:0x2}
goroutine 1 [running]:
k8s.io/klog.stacks(0x37e9801, 0x3, 0xc0007fe000, 0xb4)
        /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:900 +0xb1
k8s.io/klog.(*loggingT).output(0x37e98c0, 0xc000000003, 0xc000423490, 0x3751216, 0xa, 0x80, 0x0)
        /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:815 +0xe6
k8s.io/klog.(*loggingT).printf(0x37e98c0, 0x3, 0x2008bd6, 0x32, 0xc0005818a0, 0x2, 0x2)
        /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:727 +0x14e
k8s.io/klog.Fatalf(...)

So ensure that you use the vsphere.conf filename for the CPI.

4.1 CSI failure scenario

CSI is similar to CPI in that it requires a hard-coded configuration filename. Here is what you might observe if the csi-vsphere.conf name is not used. Here is the log snippet taken from the CSI controller pods.

root@k8s-master-01:~# kubectl logs vsphere-csi-controller-0 -n kube-system vsphere-csi-controller
I0305 16:39:14.210288       1 config.go:261] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
I0305 16:39:14.210376       1 config.go:265] Could not stat /etc/cloud/csi-vsphere.conf, reading config params from env
E0305 16:39:14.210401       1 config.go:202] No Virtual Center hosts defined
E0305 16:39:14.210415       1 config.go:269] Failed to get config params from env. Err: No Virtual Center hosts defined
E0305 16:39:14.210422       1 service.go:103] Failed to read cnsconfig. Error: stat /etc/cloud/csi-vsphere.conf: no such file or directory
I0305 16:39:14.210430       1 service.go:88] configured: csi.vsphere.vmware.com with map[mode:controller]
time="2020-03-05T16:39:14Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2020-03-05T16:39:14Z" level=fatal msg="grpc failed" error="stat /etc/cloud/csi-vsphere.conf: no such file or directory"

root@k8s-master-01:~#

The take-away is to not deviate from the hard-coded configuration filenames for both the CPI and CSI.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.