vSphere with Tanzu – new TKG 2.0 ClusterClass Preview

One of the key features of the TKG 2.0 on vSphere 8 announcement at VMware Explore 2022 is the consolidation of our the Tanzu Kubernetes offerings into a single unified Kubernetes runtime. This can be considered the second edition of VMware Tanzu Kubernetes Grid. It will still come in two flavors.  One flavor is as a VM-based standalone management cluster whilst the other flavor will be Supervisor-based, integrated into vSphere with Tanzu. However, the important point is that both flavors now have the same APIs for cluster provisioning, same tooling for extension management, and the same model for release distribution. In this post, we are going to look at the new declarative Kubernetes cluster configuration called ClusterClass.

ClusterClass is aligned with ClusterAPI and the intent is to keep it up to date with features, patching, lifecycle, etc. Through ClusterClass, the lifecycle management of multiple Kubernetes clusters should become a more simplified and declarative process. This will be achieved through a a re-usable templating system as shown below.

The ClusterClass configuration can also be customized for individual customer scenarios using collections of cluster and machine templates. Customers can adjust and makes patches to these templates to meet their specific needs. For deployments using the vSphere with Tanzu Supervisor, the ClusterClass is TanzuKubernetesCluster. Platform operators can now leverage a set of Cluster and Machine templates to create many Kubernetes clusters of a similar shape, defining the shape of a cluster once and reusing it many times.

One other feature to highlight is that the tanzu CLI can now also be used for provisioning workload clusters in vSphere with Tanzu alongside the kubectl option. Let’s have a look at a few ClusterClass manifests to see this in action.

Note: I am using non-GA versions of software to build this post. The manifests used here should be accurate but may change in the vSphere 8.0 launch, and in subsequent releases.

This first manifest creates a TKG cluster with a single control plane and 2 worker nodes. All nodes are deployed using a Photon OS image (default) with the cluster using K8s version 1.23.8. It is also using a VM Class of guaranteed-small for all nodes, and a common Storage Class which utilizes the vSAN Default Storage Policy.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: classy-ph-01b
  namespace: cormac-ns
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["198.16.0.0/12"]
    pods:
      cidrBlocks: ["192.12.0.0/16"]
  topology:
    class: tanzukubernetescluster
    version: v1.23.8---vmware.2-tkg.2-zshippable
    controlPlane:
      replicas: 1
    workers:
      machineDeployments:
        - class: node-pool
          name: node-pool-1
          replicas: 2
    variables:
      - name: vmClass
        value: guaranteed-small
      - name: storageClass
        value: vsan-default-storage-policy

Note the use of variables as well. These variables are deployed at both the scope of controlPlane and workers. Later we will see how to override these variables at different scopes. To deploy this cluster manifest, it is now possible to use the tanzu CLI as mentioned, once the appropriate authorization have been obtained. This environment has been configured to federate Pinniped with an external Identity Provider, as described in this earlier post. After the cluster has been deployed, we can use the tanzu CLI to list it, retrieve the kubeconfig, switch to that context and examine the nodes. Note that they are all Photon OS.

$ tanzu login
? Select a server pinniped-sv (https://192.168.62.17)
✔ successfully logged in to management cluster using the kubeconfig pinniped-sv
Checking for required plugins...
All required plugins are already installed and up-to-date

$ tanzu cluster create -f classy1b-single-az-photon-network.yaml -y
Validating configuration...
waiting for cluster to be initialized...
[zero or multiple KCP objects found for the given cluster, 0 classy-ph-01b cormac-ns, no MachineDeployment objects found for the given cluster]
[cluster control plane is still being initialized: WaitingForControlPlane, cluster infrastructure is still being provisioned: WaitingForControlPlane]
cluster control plane is still being initialized: VMProvisionStarted @ Machine/classy-ph-01b-ccbh9-x9vng
cluster control plane is still being initialized: WaitingForNetworkAddress @ Machine/classy-ph-01b-ccbh9-x9vng
cluster control plane is still being initialized: WaitingForNetworkAddress @ Machine/classy-ph-01b-ccbh9-x9vng
waiting for cluster nodes to be available...
waiting for addons core packages installation...

Workload cluster 'classy-ph-01b' created

$ tanzu cluster list -n cormac-ns
  NAME           NAMESPACE  STATUS  CONTROLPLANE  WORKERS  KUBERNETES        ROLES  PLAN  TKR
  classy-ph-01b  cormac-ns  running  1/1          2/2      v1.23.8+vmware.2  <none>        v1.23.8---vmware.2-tkg.2-zshippable

$ tanzu cluster kubeconfig get classy-ph-01b -n cormac-ns
ℹ You can now access the cluster by running 'kubectl config use-context tanzu-cli-classy-ph-01b@classy-ph-01b'

$ kubectl config use-context tanzu-cli-classy-ph-01b@classy-ph-01b
Switched to context "tanzu-cli-classy-ph-01b@classy-ph-01b".

$ kubectl get nodes -o wide
NAME                                              STATUS  ROLES                  AGE    VERSION           INTERNAL-IP    EXTERNAL-IP  OS-IMAGE                KERNEL-VERSION       CONTAINER-RUNTIME
classy-ub-01b-ccbh9-x9vng                         Ready    control-plane,master  13m    v1.23.8+vmware.2  192.168.38.13  <none>       VMware Photon OS/Linux  4.19.247-14.ph3-esx  containerd://1.6.6
classy-ub-01b-node-pool-1-9mwhb-68cf8bb464-5kr4p  Ready    <none>                5m43s  v1.23.8+vmware.2  192.168.38.15  <none>       VMware Photon OS/Linux  4.19.247-14.ph3-esx  containerd://1.6.6
classy-ub-01b-node-pool-1-9mwhb-68cf8bb464-dcdw5  Ready    <none>                5m57s  v1.23.8+vmware.2  192.168.38.14  <none>       VMware Photon OS/Linux  4.19.247-14.ph3-esx  containerd://1.6.6

Choosing a different OS for the K8s nodes

Let’s say that now would like to deploy a TKG cluster which uses the Ubuntu OS instead of Photon OS. This can be done by adding the following metadata to the controlPlane and workers scopes. Thus, you can choose to deploy different OS distributions for both the controlPlane nodes and the worker nodes.

 controlPlane:
   metadata:
     annotations:
       run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
workers:
  metadata:
    annotations:
      run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu

Choosing different VM Classes for the K8s nodes

There is further configuration granularity achievable. Perhaps you wish to deploy nodes of varying VM classes. For example, you may want some worker nodes to have different resources compared to other worker nodes. In this case, the workers part of the manifest could look something like this:

 workers:
      machineDeployments:
        - class: node-pool
          metadata:
            annotations:
              run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
          name: node-pool-ubuntu
          replicas: 1
          variables:
            overrides:
            - name: vmClass
              value: guaranteed-medium
        - class: node-pool
          name: node-pool-photon
          replicas: 1
          variables:
            overrides:
            - name: vmClass
              value: best-effort-medium

I mentioned earlier that it was possible to override values at different scopes. Here, there is still a global vmClass variable in the manifest (not shown) which is used by the controlPlane, but now the nodePools have their own scoped variables. The node pool with the Ubuntu worker has a VM Class of guaranteed-medium whilst the Photon worker is given a best-effort-medium VM Class. After deployment, the cluster should look something like this, with a mix of OS distributions in the cluster.

$ kubectl get nodes -o wide
NAME                                                    STATUS  ROLES                  AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION        CONTAINER-RUNTIME
classy-ub-07g-782xh-bkttx                              Ready    control-plane,master  15m    v1.23.8+vmware.2  192.168.38.16  <none>        VMware Photon OS/Linux  4.19.247-14.ph3-esx  containerd://1.6.6
classy-ub-07g-node-pool-photon-26bhm-7b5c975c86-52d9k  Ready    <none>                6m22s  v1.23.8+vmware.2  192.168.38.18  <none>        VMware Photon OS/Linux  4.19.247-14.ph3-esx  containerd://1.6.6
classy-ub-07g-node-pool-ubuntu-b6j87-8555f878b9-fkrdl  Ready    <none>                6m42s  v1.23.8+vmware.2  192.168.38.17  <none>        Ubuntu 20.04.5 LTS      5.4.0-125-generic    containerd://1.6.6

Cluster Customization – Replacing a Package

The following example shows how to customize a cluster at deployment time. It replaces the default Antrea CNI with a third party Calico CNI. Here, a ClusterBootstrap is created to add the kapp-controller (Carvel package management) and Calico CNI configurations to the boostrap cluster. In essence, what we are looking to achieve here is to swap out the default Antrea CNI package with the non-default Calico CNI package. Note that we do not need to spell out the name of the resource in its entirety in the bootstrap. There is a webhook which will fill out any additional information when it comes across a wildcard. For example, adding calico* as the refName will cause the webhook to find the full version of the package from the Tanzu Kubernetes release (TKr) specified in tkg.tanzu.vmware.com/add-missing-fields-from-tkr. Note the common names used across each of the *Configs, as well as ClusterBootstrap and Cluster objects. My understanding is that the Cluster and ClusterBootstrap names should be common, but it should not be necessary to have identical names for the *Config. However, using a common name like this is an easy way to track which clusters are using which configurations. [Update] Further testing has shown that the Cluster, ClusterBootstrap and kappControllerConfig need to share the same name, but the CalicoConfig can be different.

apiVersion: cni.tanzu.vmware.com/v1alpha1
kind: CalicoConfig
metadata:
  name: classy-ub-04d
  namespace: cormac-ns
spec:
  calico:
    config:
      vethMTU: 0
---
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: KappControllerConfig
metadata:
  name: classy-ub-04d
  namespace: cormac-ns
spec:
  namespace: tkg-system
---
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: ClusterBootstrap
metadata:
  annotations:
    tkg.tanzu.vmware.com/add-missing-fields-from-tkr: v1.23.8---vmware.2-tkg.2-zshippable
  name: classy-ub-04d
  namespace: cormac-ns
spec:
  cni:
    refName: calico*
    valuesFrom:
      providerRef:
        apiGroup: cni.tanzu.vmware.com
        kind: CalicoConfig
        name: classy-ub-04d
  kapp:
    refName: kapp-controller*
    valuesFrom:
      providerRef:
        apiGroup: run.tanzu.vmware.com
        kind: KappControllerConfig
        name: classy-ub-04d
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: classy-ub-04d
  namespace: cormac-ns
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["198.32.0.0/12"]
    pods:
      cidrBlocks: ["192.42.0.0/16"]
  topology:
    class: tanzukubernetescluster
    version: v1.23.8---vmware.2-tkg.2-zshippable
    controlPlane:
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
      replicas: 1
    workers:
      machineDeployments:
        - class: node-pool
          metadata:
            annotations:
              run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
          name: node-pool-4
          replicas: 2
    variables:
      - name: vmClass
        value: guaranteed-small
      - name: storageClass
        value: vsan-default-storage-policy

This deployment should result in a cluster which is using the Calico CNI rather than the Antrea CNI. This is observable during the deployment.

waiting for resources type *v1beta1.MachineList to be up and running
waiting for addons core packages installation...
getting ClusterBootstrap object for cluster: classy-ub-04d
waiting for resource classy-ub-04d of type *v1alpha3.ClusterBootstrap to be up and running
getting package:kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable in namespace:cormac-ns
getting package:calico.tanzu.vmware.com.3.22.1+vmware.1-tkg.2-zshippable in namespace:vmware-system-tkg
getting package:vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable in namespace:vmware-system-tkg
getting package:vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable in namespace:vmware-system-tkg
waiting for package: 'classy-ub-04d-kapp-controller'
waiting for resource classy-ub-04d-kapp-controller of type *v1alpha1.PackageInstall to be up and running
successfully reconciled package: 'classy-ub-04d-kapp-controller' in namespace: 'cormac-ns'
waiting for package: 'classy-ub-04d-calico'
waiting for package: 'classy-ub-04d-vsphere-pv-csi'
waiting for package: 'classy-ub-04d-vsphere-cpi'
waiting for resource classy-ub-04d-calico of type *v1alpha1.PackageInstall to be up and running
waiting for resource classy-ub-04d-vsphere-cpi of type *v1alpha1.PackageInstall to be up and running
waiting for resource classy-ub-04d-vsphere-pv-csi of type *v1alpha1.PackageInstall to be up and running
successfully reconciled package: 'classy-ub-04d-vsphere-pv-csi' in namespace: 'vmware-system-tkg'
successfully reconciled package: 'classy-ub-04d-calico' in namespace: 'vmware-system-tkg'
successfully reconciled package: 'classy-ub-04d-vsphere-cpi' in namespace: 'vmware-system-tkg'

Workload cluster 'classy-ub-04d' created

$
And if the pods running on the cluster are checked, we can see the Calico CNI pods.
$ kubectl get pods -A
NAMESPACE                      NAME                                                               READY   STATUS      RESTARTS        AGE
kube-system                    calico-kube-controllers-59dd58c5c7-nhnnh                           1/1     Running     0               6m35s
kube-system                    calico-node-5zr2l                                                  1/1     Running     0               6m35s
kube-system                    calico-node-n98n9                                                  1/1     Running     0               2m44s
kube-system                    calico-node-thmsb                                                  1/1     Running     0               2m35s
kube-system                    coredns-7d8f74b498-bfjbk                                           1/1     Running     0               5m32s
kube-system                    coredns-7d8f74b498-dblwg                                           1/1     Running     0               10m
kube-system                    docker-registry-classy-ub-04d-node-pool-6-d9ngf-58c764844c-bccn8   1/1     Running     0               2m31s
kube-system                    docker-registry-classy-ub-04d-node-pool-6-d9ngf-58c764844c-pm9pw   1/1     Running     0               2m44s
kube-system                    docker-registry-classy-ub-04d-ptmst-vb7xk                          1/1     Running     0               10m
kube-system                    etcd-classy-ub-04d-ptmst-vb7xk                                     1/1     Running     0               10m
kube-system                    kube-apiserver-classy-ub-04d-ptmst-vb7xk                           1/1     Running     0               10m
kube-system                    kube-controller-manager-classy-ub-04d-ptmst-vb7xk                  1/1     Running     0               10m
kube-system                    kube-proxy-bprz5                                                   1/1     Running     0               10m
kube-system                    kube-proxy-vnpx8                                                   1/1     Running     0               2m35s
kube-system                    kube-proxy-zjrzl                                                   1/1     Running     0               2m44s
kube-system                    kube-scheduler-classy-ub-04d-ptmst-vb7xk                           1/1     Running     0               10m
kube-system                    metrics-server-7887c94f95-cp468                                    1/1     Running     0               6m33s
pinniped-concierge             pinniped-concierge-567d456699-t4hrq                                1/1     Running     0               5m42s
pinniped-concierge             pinniped-concierge-567d456699-vw7s5                                1/1     Running     0               5m42s
pinniped-concierge             pinniped-concierge-kube-cert-agent-7bc867dbc4-btspm                1/1     Running     0               5m15s
pinniped-supervisor            pinniped-post-deploy-job-vhb4w                                     0/1     Completed   0               5m42s
secretgen-controller           secretgen-controller-55dbddbf84-4qtxb                              1/1     Running     0               6m27s
tkg-system                     kapp-controller-66bdbb9b94-6fqfv                                   2/2     Running     0               7m17s
tkg-system                     tanzu-capabilities-controller-manager-7969cd64dd-7kgpg             1/1     Running     0               5m58s
vmware-system-auth             guest-cluster-auth-svc-99hrd                                       1/1     Running     0               5m55s
vmware-system-cloud-provider   guest-cluster-cloud-provider-7f9cffcc4b-jkh9p                      1/1     Running     0               6m43s
vmware-system-csi              vsphere-csi-controller-9cf8944c9-j5f6c                             6/6     Running     0               6m42s
vmware-system-csi              vsphere-csi-node-5dm42                                             3/3     Running     0               2m35s
vmware-system-csi              vsphere-csi-node-fmczs                                             3/3     Running     3 (5m43s ago)   6m42s
vmware-system-csi              vsphere-csi-node-m4qjk                                             3/3     Running     0               2m45s

Now we have a declarative way of specifying a bespoke K8s cluster deployment. The *Configs and ClusterBootstraps exists in its own right as an object on the Supervisor cluster vSphere Namespace, and can be queried as shown below. Note the different CNIs. These configurations can be reused for future ClusterClass deployments to create additional bespoke clusters.

$ kubectl get clusterbootstrap -n cormac-ns
NAME                  CNI                                                        CSI                                                              CPI                                                            KAPP                                                                RESOLVED_TKR
classy-ph-01b        antrea.tanzu.vmware.com.1.5.3+tkg.2-zshippable            vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable  vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable  kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable  v1.23.8---vmware.2-tkg.2-zshippable
classy-ub-04d        calico.tanzu.vmware.com.3.22.1+vmware.1-tkg.2-zshippable  vsphere-pv-csi.tanzu.vmware.com.2.6.0+vmware.1-tkg.1-zshippable  vsphere-cpi.tanzu.vmware.com.1.23.1+vmware.1-tkg.2-zshippable  kapp-controller.tanzu.vmware.com.0.38.4+vmware.1-tkg.2-zshippable  v1.23.8---vmware.2-tkg.2-zshippable

Hopefully this has given you an idea about the new ClusterClass declarative mechanism. If you are interested, the sample manifests that I have used here are available on this repository. As I learn more about the different customizations, I’ll add more manifests to the repo.

Note that even though ClusterClass is now available as part of TKG 2.0, VMware continues to support the original TanzuKubernetesCluster manifest. An updated API, v1alpha3, is introduced to provide support for new features such as multi-AZ. I wrote about multi-AZ support in this earlier post.

4 Replies to “vSphere with Tanzu – new TKG 2.0 ClusterClass Preview”

    1. It is something that is being researched currently Bart. If there is a particular use-case or feature that you had in mind, I’d like to learn more. Please reach out. Thanks

      1. One use case to use a mix cluster of Linux and Windows base worker is to run a Signing service to support both OS
        Currently design is
        For Linux – Run Linux signing in a container
        For Windows – Run a Linux container that run a “ssh” to a Windows VM that is doing the Signing process
        Such a environment is harder to maintain
        Second use case
        Having a product that have Linux backend and client side that need to support many OS including Windows
        I would like to have a k8s base build system and run all build’s natively in K8s cluster no need to ssh to a Windows VM or continue to use Jenkins 🙁 agent’s

Comments are closed.