Tanzu Kubernetes considerations with the new VM Class in vSphere with Tanzu

I recently posted about a new feature in vSphere with Tanzu called VM Service which became available with vSphere 7.0U2a. In a nutshell, this new service allows developers to provision not just Tanzu Kubernetes Clusters and PodVMs in their respective namespaces. Now they can also provision native Virtual Machines as well. The VM Service introduces a new feature called VirtualMachineClassBindings to a developer, and has also introduced some new behaviour around an existing feature, VirtualMachineClass.

VirtualMachineClass describes the available resource sizing for virtual machines. They describe how much compute and memory to allocate to a VM, and also if the resources are guaranteed (reserved) or if they are “best-effort”, meaning they are not guaranteed when there is resource contention on the system. There are a number of existing classes available to a developer, but there is also the ability to build bespoke classes to meet any need. VirtualMachineClass was available in previous versions of vSphere with Tanzu since it describes the size of virtual machines which make up the nodes for Tanzu Kubernetes control plane and worker nodes.

VirtualMachineClassBindings is a new feature that describes which VirtualMachineClass(es) have been assigned to a particular namespace. In the past, all namespaces had access to all VirtualMachineClasses. Now a vSphere administrator or platform administrator can control the size of virtual machines that a developer can create. It is only when a VirtualMachineClass has been ‘bound’ to a namespace that it can be used for the creation of virtual machines. This includes the creation of virtual machines which are used to back control plane and worker nodes in a Tanzu Kubernetes workload cluster provisioned in vSphere with Tanzu. I will focus on this point later in this post, as it is a subtle change in behaviour to how things worked previously.

Let’s begin by taking a look at an vSphere with Tanzu environment which has not yet had a namespace created.

 $ kubectl config get-contexts
CURRENT  NAME                                      CLUSTER            AUTHINFO                                        NAMESPACE
*        10.202.112.152                            10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local


$ kubectl get virtualmachineclasses
No resources found


$ kubectl get virtualmachineclasses
No resources found

So nothing configured to begin with. Neither classes or bindings are present. Let’s now go ahead an create a new namespace called “new-namespace”, but let’s not add any VirtualMachineClasses to this namespace just yet.

$ kubectl config get-contexts
CURRENT  NAME                                      CLUSTER            AUTHINFO                                        NAMESPACE
         10.202.112.152                           10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local
*        new-namespace                            10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local    new-namespace


$ kubectl get virtualmachineclasses
No resources found


$ kubectl get virtualmachineclassbindings
No resources found in new-namespace namespace.

Since there are no classes assigned, bindings are also empty. I can now change that and, via the vSphere Client, assign all 16 existing VirtualMachineClasses to the namespace “new-namespace”.

The 16 VirtualMachineClassBindings are now visible from the namespace context in the CLI. Note that as soon as a binding has been assigned, then the class also becomes visible across other namespaces. This will become more obvious when we build another namespace shortly.

$ kubectl get virtualmachineclasses
NAME                   CPU   MEMORY   AGE
best-effort-2xlarge     8    64Gi     2m36s
best-effort-4xlarge    16    128Gi    2m30s
best-effort-8xlarge    32    128Gi    2m35s
best-effort-large       4    16Gi      114s
best-effort-medium      2    8Gi      2m15s
best-effort-small       2    4Gi      2m36s
best-effort-xlarge      4    32Gi     2m36s
best-effort-xsmall      2    2Gi      2m36s
guaranteed-2xlarge      8    64Gi     2m36s
guaranteed-4xlarge     16    128Gi    2m33s
guaranteed-8xlarge     32    128Gi    2m25s
guaranteed-large        4    16Gi     2m36s
guaranteed-medium       2    8Gi      2m35s
guaranteed-small        2    4Gi      2m34s
guaranteed-xlarge       4    32Gi       73s


$ kubectl get virtualmachineclassbindings
NAME                  VIRTUALMACHINECLASS   AGE
best-effort-2xlarge   best-effort-2xlarge   2m48s
best-effort-4xlarge   best-effort-4xlarge   96s
best-effort-8xlarge   best-effort-8xlarge   2m58s
best-effort-large     best-effort-large     96s
best-effort-medium    best-effort-medium    96s
best-effort-small     best-effort-small     2m59s
best-effort-xlarge    best-effort-xlarge    2m59s
best-effort-xsmall    best-effort-xsmall    2m38s
guaranteed-2xlarge    guaranteed-2xlarge    2m48s
guaranteed-4xlarge    guaranteed-4xlarge    96s
guaranteed-8xlarge    guaranteed-8xlarge    96s
guaranteed-large      guaranteed-large      2m38s
guaranteed-medium     guaranteed-medium     96s
guaranteed-small      guaranteed-small      2m18s

It is possible to run a kubectl describe against each of the classes to see how much CPU and Memory resources are associated with each class.

Now, you are probably asking why we have introduced a VirtualMachineClassBindings at all?  The difference becomes clear when we create a second namespace. What you will notice that in the new namespace (cormac-new-ns), we have visibility into VirtualMachineClasses that have been assigned to other namespaces, but unless one has been specifically bound to our namespace and visible via VirtualMachineClassBindings, we cannot use those VirtualMachineClasses. Thus, once a virtual machine class has been bound at least once, it then becomes visible in other namespaces. So, as highlighted below, while I am able to see all 16 VirtualMachineClasses because they are already bound to a namespace (new-namespace), I cannot use them in the “cormac-new-ns” namespace context as they are not bound here.

$ kubectl config get-contexts
CURRENT  NAME                                      CLUSTER            AUTHINFO                                        NAMESPACE
*        10.202.112.152                            10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local
          cormac-new-ns                            10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local  cormac-new-ns
          new-namespace                            10.202.112.152    wcp:10.202.112.152:administrator@vsphere.local  new-namespace


$ kubectl config use-context cormac-new-ns
Switched to context "cormac-new-ns".


$ kubectl get virtualmachineclasses
NAME                  CPU    MEMORY    AGE
best-effort-2xlarge     8     64Gi     6m9s
best-effort-4xlarge    16    128Gi     6m3s
best-effort-8xlarge    32    128Gi     6m8s
best-effort-large       4     16Gi     5m27s
best-effort-medium      2      8Gi     5m48s
best-effort-small       2      4Gi     6m9s
best-effort-xlarge      4     32Gi     6m9s
best-effort-xsmall      2      2Gi     6m9s
guaranteed-2xlarge      8     64Gi     6m9s
guaranteed-4xlarge     16    128Gi     6m6s
guaranteed-8xlarge     32    128Gi     5m58s
guaranteed-large        4     16Gi     6m9s
guaranteed-medium       2      8Gi     6m8s
guaranteed-small        2      4Gi     6m7s
guaranteed-xlarge       4     32Gi     4m46s
guaranteed-xsmall       2      2Gi     3m24s


$ kubectl get virtualmachineclassbindings
No resources found in cormac-new-ns namespace.

However, if I now go to the vSphere client, and assign a VirtualMachineClass to this namespace, the binding will show up.

$ kubectl get virtualmachineclassbindings
NAME              VIRTUALMACHINECLASS   AGE
guaranteed-large  guaranteed-large      10s

Considerations for Tanzu Kubernetes in vSphere with Tanzu

Now let’s say I want to create a Tanzu Kubernetes workload cluster in my “cormac-new-ns” namespace. In previous versions of vSphere with Tanzu, I didn’t have to worry about VirtualMachineClass or VirtualMachineBindings. I simply created my TanzuKubernetesCluster manifest and applied it. So long as the image was available in the content library, I was good to go. Here is an example of such a manifest.

$ cat tkgs-cluster.1.20.2-nobindingvmclass.yaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
 name: tkg-cluster-1-20-2
spec:
 topology:
  controlPlane:
    count: 3
    class: guaranteed-medium
    storageClass: vsan-default-storage-policy
  workers:
    count: 5
    class: best-effort-medium
    storageClass: vsan-default-storage-policy
 distribution:
  version: v1.20.2

Note that in this example, the spec.topology.controlPlane.class and the spec.topology.workers.class have not been bound to this namespace. So even though they appear in the VirtualMachineClass output, they do not appear in the VirtualMachineClassBindings output. Thus, if we attempt to use them by applying this manifest, the cluster creation will fail as follows:

$ kubectl get TanzuKubernetesCluster
NAME                CONTROL PLANE  WORKER   DISTRIBUTION                    AGE    PHASE    TKR COMPATIBLE  UPDATES AVAILABLE
tkg-cluster-1-20-2  3              5        v1.20.2+vmware.1-tkg.2.3e10706  4m10s  failed   True


$ kubectl describe TanzuKubernetesCluster
Name:        tkg-cluster-1-20-2
Namespace:    cormac-new-ns
Labels:      run.tanzu.vmware.com/tkr=v1.20.2---vmware.1-tkg.2.3e10706
Annotations:  <none>
API Version:  run.tanzu.vmware.com/v1alpha1
Kind:        TanzuKubernetesCluster
Metadata:
  Creation Timestamp:  2021-06-15T12:09:13Z
  Finalizers:
    tanzukubernetescluster.run.tanzu.vmware.com
.
.
.
  Conditions:
    Last Transition Time:  2021-06-15T12:09:32Z
    Message:              1 of 2 completed
    Reason:                VirtualMachineClassBindingNotFound @ Machine/tkg-cluster-1-20-2-control-plane-lqwnh
    Severity:              Error
    Status:                False
    Type:                  ControlPlaneReady
    Last Transition Time:  2021-06-15T12:09:26Z
    Message:              0/3 Control Plane Node(s) healthy. 0/5 Worker Node(s) healthy
    Reason:                WaitingForNodesHealthy
    Severity:              Info
    Status:                False
    Type:                  NodesHealthy
    Last Transition Time:  2021-06-14T08:33:41Z
    Status:                True
    Type:                  TanzuKubernetesReleaseCompatible
    Last Transition Time:  2021-06-14T08:33:41Z
    Reason:                NoUpdates
    Status:                False
    Type:                  UpdatesAvailable
  Node Status:
    tkg-cluster-1-20-2-control-plane-lqwnh:            pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-2pbrd:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-4pf87:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-8pzdb:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-fzgpz:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-tzkth:  pending
  Phase:                                              failed
  Vm Status:
    tkg-cluster-1-20-2-control-plane-lqwnh:            pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-2pbrd:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-4pf87:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-8pzdb:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-fzgpz:  pending
    tkg-cluster-1-20-2-workers-wlppj-778dff98c-tzkth:  pending
Events:
  Type    Reason        Age        From                                                                                             Message
  ----    ------        ----      ----                                                                                              -------
  Normal  PhaseChanged  <invalid>  vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller  cluster changes from creating phase to failed phase

To successfully deploy this manifest, we would need to bind the two missing VirtualMachineClass to the namespace.

Now if we query the bindings, we should see the two bindings needed by the Tanzu Kubernetes Cluster.

$ kubectl get virtualmachineclassbindings
NAME                 VIRTUALMACHINECLASS   AGE
best-effort-medium   best-effort-medium    5m30s
guaranteed-large     guaranteed-large      16m
guaranteed-medium    guaranteed-medium     2m31s
All going well, if we try another attempt at creating the cluster, it should now deploy successfully.
$ kubectl get TanzuKubernetesCluster
NAME                CONTROL PLANE  WORKER   DISTRIBUTION                    AGE  PHASE    TKR COMPATIBLE  UPDATES AVAILABLE
tkg-cluster-1-20-2  3              5        v1.20.2+vmware.1-tkg.2.3e10706  56m  running  True

To conclude, keep in mind that with the introduction of the VM Service in vSphere with Tanzu (vSphere 7.0U2a), VirtualMachineClasses need to be bound to a namespace before they can be used, which is different to how things worked in earlier versions of vSphere with Tanzu, where all namespaces had access to all classes.

3 Replies to “Tanzu Kubernetes considerations with the new VM Class in vSphere with Tanzu”

  1. Any thoughts on whether 3 “small” VMs (best-effort in test, guaranteed in prod) would generally suffice for control plane nodes? In an environment where scaling is limited, I hate to dedicate too many resources to the control plan if it can be avoided.

    1. I’m afraid the answer is it depends. So it will depend on how many objects you deploy at the cluster level such as number of TKG clusters, PodVMs and VM deployed via vSphere service, and also how often you are deploying. I’m not aware of any documentation that helps with that sizing decision unfortunately.

      1. Thanks for the response, Cormac – I’ve learned a lot from your blog posts. But yeah, I’m all too familiar with the “it depends” answer – I hate how often my own answers start that way! Agreed that it’s unfortunate there aren’t any documents to help with those sizing decisions. Perhaps some basic guidance for ballpark expectations from “kubectl top nodes” output would even be nice.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.