One of the key features of the TKG 2.0 on vSphere 8 announcement at VMware Explore 2022 is the consolidation of our the Tanzu Kubernetes offerings into a single unified Kubernetes runtime. This can be considered the second edition of VMware Tanzu Kubernetes Grid. It will still come in two flavors. One flavor is as a VM-based standalone management cluster whilst the other flavor will be Supervisor-based, integrated into vSphere with Tanzu. However, the important point is that both flavors now have the same APIs for cluster provisioning, same tooling for extension management, and the same model for release distribution. In this post, we are going to look at the new declarative Kubernetes cluster configuration called ClusterClass.
ClusterClass is aligned with ClusterAPI and the intent is to keep it up to date with features, patching, lifecycle, etc. Through ClusterClass, the lifecycle management of multiple Kubernetes clusters should become a more simplified and declarative process. This will be achieved through a a re-usable templating system as shown below.
The ClusterClass configuration can also be customized for individual customer scenarios using collections of cluster and machine templates. Customers can adjust and makes patches to these templates to meet their specific needs. For deployments using the vSphere with Tanzu Supervisor, the ClusterClass is TanzuKubernetesCluster. Platform operators can now leverage a set of Cluster and Machine templates to create many Kubernetes clusters of a similar shape, defining the shape of a cluster once and reusing it many times.
One other feature to highlight is that the tanzu CLI can now also be used for provisioning workload clusters in vSphere with Tanzu alongside the kubectl option. Let’s have a look at a few ClusterClass manifests to see this in action.
Note: I am using non-GA versions of software to build this post. The manifests used here should be accurate but may change in the vSphere 8.0 launch, and in subsequent releases.
This first manifest creates a TKG cluster with a single control plane and 2 worker nodes. All nodes are deployed using a Photon OS image (default) with the cluster using K8s version 1.23.8. It is also using a VM Class of guaranteed-small for all nodes, and a common Storage Class which utilizes the vSAN Default Storage Policy.
apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: classy-ph-01b namespace: cormac-ns spec: clusterNetwork: services: cidrBlocks: ["220.127.116.11/12"] pods: cidrBlocks: ["18.104.22.168/16"] topology: class: tanzukubernetescluster version: v1.23.8---vmware.2-tkg.2-zshippable controlPlane: replicas: 1 workers: machineDeployments: - class: node-pool name: node-pool-1 replicas: 2 variables: - name: vmClass value: guaranteed-small - name: storageClass value: vsan-default-storage-policy
Note the use of variables as well. These variables are deployed at both the scope of controlPlane and workers. Later we will see how to override these variables at different scopes. To deploy this cluster manifest, it is now possible to use the tanzu CLI as mentioned, once the appropriate authorization have been obtained. This environment has been configured to federate Pinniped with an external Identity Provider, as described in this earlier post. After the cluster has been deployed, we can use the tanzu CLI to list it, retrieve the kubeconfig, switch to that context and examine the nodes. Note that they are all Photon OS.
$ tanzu login ? Select a server pinniped-sv (https://192.168.62.17) ✔ successfully logged in to management cluster using the kubeconfig pinniped-sv Checking for required plugins... All required plugins are already installed and up-to-date $ tanzu cluster create -f classy1b-single-az-photon-network.yaml -y Validating configuration... waiting for cluster to be initialized... [zero or multiple KCP objects found for the given cluster, 0 classy-ph-01b cormac-ns, no MachineDeployment objects found for the given cluster] [cluster control plane is still being initialized: WaitingForControlPlane, cluster infrastructure is still being provisioned: WaitingForControlPlane] cluster control plane is still being initialized: VMProvisionStarted @ Machine/classy-ph-01b-ccbh9-x9vng cluster control plane is still being initialized: WaitingForNetworkAddress @ Machine/classy-ph-01b-ccbh9-x9vng cluster control plane is still being initialized: WaitingForNetworkAddress @ Machine/classy-ph-01b-ccbh9-x9vng waiting for cluster nodes to be available... waiting for addons core packages installation... Workload cluster 'classy-ph-01b' created $ tanzu cluster list -n cormac-ns NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN TKR classy-ph-01b cormac-ns running 1/1 2/2 v1.23.8+vmware.2 <none> v1.23.8---vmware.2-tkg.2-zshippable $ tanzu cluster kubeconfig get classy-ph-01b -n cormac-ns ℹ You can now access the cluster by running 'kubectl config use-context tanzu-cli-classy-ph-01b@classy-ph-01b' $ kubectl config use-context tanzu-cli-classy-ph-01b@classy-ph-01b Switched to context "tanzu-cli-classy-ph-01b@classy-ph-01b". $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME classy-ub-01b-ccbh9-x9vng Ready control-plane,master 13m v1.23.8+vmware.2 192.168.38.13 <none> VMware Photon OS/Linux 4.19.247-14.ph3-esx containerd://1.6.6 classy-ub-01b-node-pool-1-9mwhb-68cf8bb464-5kr4p Ready <none> 5m43s v1.23.8+vmware.2 192.168.38.15 <none> VMware Photon OS/Linux 4.19.247-14.ph3-esx containerd://1.6.6 classy-ub-01b-node-pool-1-9mwhb-68cf8bb464-dcdw5 Ready <none> 5m57s v1.23.8+vmware.2 192.168.38.14 <none> VMware Photon OS/Linux 4.19.247-14.ph3-esx containerd://1.6.6
Choosing a different OS for the K8s nodes
Let’s say that now would like to deploy a TKG cluster which uses the Ubuntu OS instead of Photon OS. This can be done by adding the following metadata to the controlPlane and workers scopes. Thus, you can choose to deploy different OS distributions for both the controlPlane nodes and the worker nodes.
controlPlane: metadata: annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
workers: metadata: annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
Choosing different VM Classes for the K8s nodes
There is further configuration granularity achievable. Perhaps you wish to deploy nodes of varying VM classes. For example, you may want some worker nodes to have different resources compared to other worker nodes. In this case, the workers part of the manifest could look something like this:
workers: machineDeployments: - class: node-pool metadata: annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu name: node-pool-ubuntu replicas: 1 variables: overrides: - name: vmClass value: guaranteed-medium - class: node-pool name: node-pool-photon replicas: 1 variables: overrides: - name: vmClass value: best-effort-medium
I mentioned earlier that it was possible to override values at different scopes. Here, there is still a global vmClass variable in the manifest (not shown) which is used by the controlPlane, but now the nodePools have their own scoped variables. The node pool with the Ubuntu worker has a VM Class of guaranteed-medium whilst the Photon worker is given a best-effort-medium VM Class. After deployment, the cluster should look something like this, with a mix of OS distributions in the cluster.
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME classy-ub-07g-782xh-bkttx Ready control-plane,master 15m v1.23.8+vmware.2 192.168.38.16 <none> VMware Photon OS/Linux 4.19.247-14.ph3-esx containerd://1.6.6 classy-ub-07g-node-pool-photon-26bhm-7b5c975c86-52d9k Ready <none> 6m22s v1.23.8+vmware.2 192.168.38.18 <none> VMware Photon OS/Linux 4.19.247-14.ph3-esx containerd://1.6.6 classy-ub-07g-node-pool-ubuntu-b6j87-8555f878b9-fkrdl Ready <none> 6m42s v1.23.8+vmware.2 192.168.38.17 <none> Ubuntu 20.04.5 LTS 5.4.0-125-generic containerd://1.6.6
Cluster Customization – Replacing a Package
The following example shows how to customize a cluster at deployment time. It replaces the default Antrea CNI with a third party Calico CNI. Here, a ClusterBootstrap is created to add the kapp-controller (Carvel package management) and Calico CNI configurations to the boostrap cluster. In essence, what we are looking to achieve here is to swap out the default Antrea CNI package with the non-default Calico CNI package. Note that we do not need to spell out the name of the resource in its entirety in the bootstrap. There is a webhook which will fill out any additional information when it comes across a wildcard. For example, adding calico* as the refName will cause the webhook to find the full version of the package from the Tanzu Kubernetes release (TKr) specified in tkg.tanzu.vmware.com/add-missing-fields-from-tkr. Note the common names used across each of the *Configs, as well as ClusterBootstrap and Cluster objects. My understanding is that the Cluster and ClusterBootstrap names should be common, but it should not be necessary to have identical names for the *Config. However, using a common name like this is an easy way to track which clusters are using which configurations. [Update] Further testing has shown that the Cluster, ClusterBootstrap and kappControllerConfig need to share the same name, but the CalicoConfig can be different.
apiVersion: cni.tanzu.vmware.com/v1alpha1 kind: CalicoConfig metadata: name: classy-ub-04d namespace: cormac-ns spec: calico: config: vethMTU: 0 --- apiVersion: run.tanzu.vmware.com/v1alpha3 kind: KappControllerConfig metadata: name: classy-ub-04d namespace: cormac-ns spec: namespace: tkg-system --- apiVersion: run.tanzu.vmware.com/v1alpha3 kind: ClusterBootstrap metadata: annotations: tkg.tanzu.vmware.com/add-missing-fields-from-tkr: v1.23.8---vmware.2-tkg.2-zshippable name: classy-ub-04d namespace: cormac-ns spec: cni: refName: calico* valuesFrom: providerRef: apiGroup: cni.tanzu.vmware.com kind: CalicoConfig name: classy-ub-04d kapp: refName: kapp-controller* valuesFrom: providerRef: apiGroup: run.tanzu.vmware.com kind: KappControllerConfig name: classy-ub-04d --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: classy-ub-04d namespace: cormac-ns spec: clusterNetwork: services: cidrBlocks: ["22.214.171.124/12"] pods: cidrBlocks: ["126.96.36.199/16"] topology: class: tanzukubernetescluster version: v1.23.8---vmware.2-tkg.2-zshippable controlPlane: metadata: annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu replicas: 1 workers: machineDeployments: - class: node-pool metadata: annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu name: node-pool-4 replicas: 2 variables: - name: vmClass value: guaranteed-small - name: storageClass value: vsan-default-storage-policy
This deployment should result in a cluster which is using the Calico CNI rather than the Antrea CNI. This is observable during the deployment.
waiting for resources type *v1beta1.MachineList to be up and running waiting for addons core packages installation... getting ClusterBootstrap object for cluster: classy-ub-04d waiting for resource classy-ub-04d of type *v1alpha3.ClusterBootstrap to be up and running getting package:kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable in namespace:cormac-ns getting package:calico.tanzu.vmware.com.3.22.1+vmware.1-tkg.2-zshippable in namespace:vmware-system-tkg getting package:vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable in namespace:vmware-system-tkg getting package:vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable in namespace:vmware-system-tkg waiting for package: 'classy-ub-04d-kapp-controller' waiting for resource classy-ub-04d-kapp-controller of type *v1alpha1.PackageInstall to be up and running successfully reconciled package: 'classy-ub-04d-kapp-controller' in namespace: 'cormac-ns' waiting for package: 'classy-ub-04d-calico' waiting for package: 'classy-ub-04d-vsphere-pv-csi' waiting for package: 'classy-ub-04d-vsphere-cpi' waiting for resource classy-ub-04d-calico of type *v1alpha1.PackageInstall to be up and running waiting for resource classy-ub-04d-vsphere-cpi of type *v1alpha1.PackageInstall to be up and running waiting for resource classy-ub-04d-vsphere-pv-csi of type *v1alpha1.PackageInstall to be up and running successfully reconciled package: 'classy-ub-04d-vsphere-pv-csi' in namespace: 'vmware-system-tkg' successfully reconciled package: 'classy-ub-04d-calico' in namespace: 'vmware-system-tkg' successfully reconciled package: 'classy-ub-04d-vsphere-cpi' in namespace: 'vmware-system-tkg' Workload cluster 'classy-ub-04d' created $
$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-59dd58c5c7-nhnnh 1/1 Running 0 6m35s kube-system calico-node-5zr2l 1/1 Running 0 6m35s kube-system calico-node-n98n9 1/1 Running 0 2m44s kube-system calico-node-thmsb 1/1 Running 0 2m35s kube-system coredns-7d8f74b498-bfjbk 1/1 Running 0 5m32s kube-system coredns-7d8f74b498-dblwg 1/1 Running 0 10m kube-system docker-registry-classy-ub-04d-node-pool-6-d9ngf-58c764844c-bccn8 1/1 Running 0 2m31s kube-system docker-registry-classy-ub-04d-node-pool-6-d9ngf-58c764844c-pm9pw 1/1 Running 0 2m44s kube-system docker-registry-classy-ub-04d-ptmst-vb7xk 1/1 Running 0 10m kube-system etcd-classy-ub-04d-ptmst-vb7xk 1/1 Running 0 10m kube-system kube-apiserver-classy-ub-04d-ptmst-vb7xk 1/1 Running 0 10m kube-system kube-controller-manager-classy-ub-04d-ptmst-vb7xk 1/1 Running 0 10m kube-system kube-proxy-bprz5 1/1 Running 0 10m kube-system kube-proxy-vnpx8 1/1 Running 0 2m35s kube-system kube-proxy-zjrzl 1/1 Running 0 2m44s kube-system kube-scheduler-classy-ub-04d-ptmst-vb7xk 1/1 Running 0 10m kube-system metrics-server-7887c94f95-cp468 1/1 Running 0 6m33s pinniped-concierge pinniped-concierge-567d456699-t4hrq 1/1 Running 0 5m42s pinniped-concierge pinniped-concierge-567d456699-vw7s5 1/1 Running 0 5m42s pinniped-concierge pinniped-concierge-kube-cert-agent-7bc867dbc4-btspm 1/1 Running 0 5m15s pinniped-supervisor pinniped-post-deploy-job-vhb4w 0/1 Completed 0 5m42s secretgen-controller secretgen-controller-55dbddbf84-4qtxb 1/1 Running 0 6m27s tkg-system kapp-controller-66bdbb9b94-6fqfv 2/2 Running 0 7m17s tkg-system tanzu-capabilities-controller-manager-7969cd64dd-7kgpg 1/1 Running 0 5m58s vmware-system-auth guest-cluster-auth-svc-99hrd 1/1 Running 0 5m55s vmware-system-cloud-provider guest-cluster-cloud-provider-7f9cffcc4b-jkh9p 1/1 Running 0 6m43s vmware-system-csi vsphere-csi-controller-9cf8944c9-j5f6c 6/6 Running 0 6m42s vmware-system-csi vsphere-csi-node-5dm42 3/3 Running 0 2m35s vmware-system-csi vsphere-csi-node-fmczs 3/3 Running 3 (5m43s ago) 6m42s vmware-system-csi vsphere-csi-node-m4qjk 3/3 Running 0 2m45s
Now we have a declarative way of specifying a bespoke K8s cluster deployment. The *Configs and ClusterBootstraps exists in its own right as an object on the Supervisor cluster vSphere Namespace, and can be queried as shown below. Note the different CNIs. These configurations can be reused for future ClusterClass deployments to create additional bespoke clusters.
$ kubectl get clusterbootstrap -n cormac-ns NAME CNI CSI CPI KAPP RESOLVED_TKR classy-ph-01b antrea.tanzu.vmware.com.1.5.3+tkg.2-zshippable vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable v1.23.8---vmware.2-tkg.2-zshippable classy-ub-04d calico.tanzu.vmware.com.3.22.1+vmware.1-tkg.2-zshippable vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable v1.23.8---vmware.2-tkg.2-zshippable
Hopefully this has given you an idea about the new ClusterClass declarative mechanism. If you are interested, the sample manifests that I have used here are available on this repository. As I learn more about the different customizations, I’ll add more manifests to the repo.
Note that even though ClusterClass is now available as part of TKG 2.0, VMware continues to support the original TanzuKubernetesCluster manifest. An updated API, v1alpha3, is introduced to provide support for new features such as multi-AZ. I wrote about multi-AZ support in this earlier post.