Tanzu Kubernetes considerations with the new VM Class in vSphere with Tanzu
I recently posted about a new feature in vSphere with Tanzu called VM Service which became available with vSphere 7.0U2a. In a nutshell, this new service allows developers to provision not just Tanzu Kubernetes Clusters and PodVMs in their respective namespaces. Now they can also provision native Virtual Machines as well. The VM Service introduces a new feature called VirtualMachineClassBindings to a developer, and has also introduced some new behaviour around an existing feature, VirtualMachineClass.
VirtualMachineClass describes the available resource sizing for virtual machines. They describe how much compute and memory to allocate to a VM, and also if the resources are guaranteed (reserved) or if they are “best-effort”, meaning they are not guaranteed when there is resource contention on the system. There are a number of existing classes available to a developer, but there is also the ability to build bespoke classes to meet any need. VirtualMachineClass was available in previous versions of vSphere with Tanzu since it describes the size of virtual machines which make up the nodes for Tanzu Kubernetes control plane and worker nodes.
VirtualMachineClassBindings is a new feature that describes which VirtualMachineClass(es) have been assigned to a particular namespace. In the past, all namespaces had access to all VirtualMachineClasses. Now a vSphere administrator or platform administrator can control the size of virtual machines that a developer can create. It is only when a VirtualMachineClass has been ‘bound’ to a namespace that it can be used for the creation of virtual machines. This includes the creation of virtual machines which are used to back control plane and worker nodes in a Tanzu Kubernetes workload cluster provisioned in vSphere with Tanzu. I will focus on this point later in this post, as it is a subtle change in behaviour to how things worked previously.
Let’s begin by taking a look at an vSphere with Tanzu environment which has not yet had a namespace created.
$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * 10.202.112.152 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local $ kubectl get virtualmachineclasses No resources found $ kubectl get virtualmachineclasses No resources found
So nothing configured to begin with. Neither classes or bindings are present. Let’s now go ahead an create a new namespace called “new-namespace”, but let’s not add any VirtualMachineClasses to this namespace just yet.
$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE 10.202.112.152 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local * new-namespace 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local new-namespace $ kubectl get virtualmachineclasses No resources found $ kubectl get virtualmachineclassbindings No resources found in new-namespace namespace.
Since there are no classes assigned, bindings are also empty. I can now change that and, via the vSphere Client, assign all 16 existing VirtualMachineClasses to the namespace “new-namespace”.
The 16 VirtualMachineClassBindings are now visible from the namespace context in the CLI. Note that as soon as a binding has been assigned, then the class also becomes visible across other namespaces. This will become more obvious when we build another namespace shortly.
$ kubectl get virtualmachineclasses NAME CPU MEMORY AGE best-effort-2xlarge 8 64Gi 2m36s best-effort-4xlarge 16 128Gi 2m30s best-effort-8xlarge 32 128Gi 2m35s best-effort-large 4 16Gi 114s best-effort-medium 2 8Gi 2m15s best-effort-small 2 4Gi 2m36s best-effort-xlarge 4 32Gi 2m36s best-effort-xsmall 2 2Gi 2m36s guaranteed-2xlarge 8 64Gi 2m36s guaranteed-4xlarge 16 128Gi 2m33s guaranteed-8xlarge 32 128Gi 2m25s guaranteed-large 4 16Gi 2m36s guaranteed-medium 2 8Gi 2m35s guaranteed-small 2 4Gi 2m34s guaranteed-xlarge 4 32Gi 73s $ kubectl get virtualmachineclassbindings NAME VIRTUALMACHINECLASS AGE best-effort-2xlarge best-effort-2xlarge 2m48s best-effort-4xlarge best-effort-4xlarge 96s best-effort-8xlarge best-effort-8xlarge 2m58s best-effort-large best-effort-large 96s best-effort-medium best-effort-medium 96s best-effort-small best-effort-small 2m59s best-effort-xlarge best-effort-xlarge 2m59s best-effort-xsmall best-effort-xsmall 2m38s guaranteed-2xlarge guaranteed-2xlarge 2m48s guaranteed-4xlarge guaranteed-4xlarge 96s guaranteed-8xlarge guaranteed-8xlarge 96s guaranteed-large guaranteed-large 2m38s guaranteed-medium guaranteed-medium 96s guaranteed-small guaranteed-small 2m18s
It is possible to run a kubectl describe against each of the classes to see how much CPU and Memory resources are associated with each class.
Now, you are probably asking why we have introduced a VirtualMachineClassBindings at all? The difference becomes clear when we create a second namespace. What you will notice that in the new namespace (cormac-new-ns), we have visibility into VirtualMachineClasses that have been assigned to other namespaces, but unless one has been specifically bound to our namespace and visible via VirtualMachineClassBindings, we cannot use those VirtualMachineClasses. Thus, once a virtual machine class has been bound at least once, it then becomes visible in other namespaces. So, as highlighted below, while I am able to see all 16 VirtualMachineClasses because they are already bound to a namespace (new-namespace), I cannot use them in the “cormac-new-ns” namespace context as they are not bound here.
$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * 10.202.112.152 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local cormac-new-ns 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local cormac-new-ns new-namespace 10.202.112.152 wcp:10.202.112.152:administrator@vsphere.local new-namespace $ kubectl config use-context cormac-new-ns Switched to context "cormac-new-ns". $ kubectl get virtualmachineclasses NAME CPU MEMORY AGE best-effort-2xlarge 8 64Gi 6m9s best-effort-4xlarge 16 128Gi 6m3s best-effort-8xlarge 32 128Gi 6m8s best-effort-large 4 16Gi 5m27s best-effort-medium 2 8Gi 5m48s best-effort-small 2 4Gi 6m9s best-effort-xlarge 4 32Gi 6m9s best-effort-xsmall 2 2Gi 6m9s guaranteed-2xlarge 8 64Gi 6m9s guaranteed-4xlarge 16 128Gi 6m6s guaranteed-8xlarge 32 128Gi 5m58s guaranteed-large 4 16Gi 6m9s guaranteed-medium 2 8Gi 6m8s guaranteed-small 2 4Gi 6m7s guaranteed-xlarge 4 32Gi 4m46s guaranteed-xsmall 2 2Gi 3m24s $ kubectl get virtualmachineclassbindings No resources found in cormac-new-ns namespace.
However, if I now go to the vSphere client, and assign a VirtualMachineClass to this namespace, the binding will show up.
$ kubectl get virtualmachineclassbindings NAME VIRTUALMACHINECLASS AGE guaranteed-large guaranteed-large 10s
Considerations for Tanzu Kubernetes in vSphere with Tanzu
Now let’s say I want to create a Tanzu Kubernetes workload cluster in my “cormac-new-ns” namespace. In previous versions of vSphere with Tanzu, I didn’t have to worry about VirtualMachineClass or VirtualMachineBindings. I simply created my TanzuKubernetesCluster manifest and applied it. So long as the image was available in the content library, I was good to go. Here is an example of such a manifest.
$ cat tkgs-cluster.1.20.2-nobindingvmclass.yaml apiVersion: run.tanzu.vmware.com/v1alpha1 kind: TanzuKubernetesCluster metadata: name: tkg-cluster-1-20-2 spec: topology: controlPlane: count: 3 class: guaranteed-medium storageClass: vsan-default-storage-policy workers: count: 5 class: best-effort-medium storageClass: vsan-default-storage-policy distribution: version: v1.20.2
Note that in this example, the spec.topology.controlPlane.class and the spec.topology.workers.class have not been bound to this namespace. So even though they appear in the VirtualMachineClass output, they do not appear in the VirtualMachineClassBindings output. Thus, if we attempt to use them by applying this manifest, the cluster creation will fail as follows:
$ kubectl get TanzuKubernetesCluster NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE TKR COMPATIBLE UPDATES AVAILABLE tkg-cluster-1-20-2 3 5 v1.20.2+vmware.1-tkg.2.3e10706 4m10s failed True $ kubectl describe TanzuKubernetesCluster Name: tkg-cluster-1-20-2 Namespace: cormac-new-ns Labels: run.tanzu.vmware.com/tkr=v1.20.2---vmware.1-tkg.2.3e10706 Annotations: <none> API Version: run.tanzu.vmware.com/v1alpha1 Kind: TanzuKubernetesCluster Metadata: Creation Timestamp: 2021-06-15T12:09:13Z Finalizers: tanzukubernetescluster.run.tanzu.vmware.com . . . Conditions: Last Transition Time: 2021-06-15T12:09:32Z Message: 1 of 2 completed Reason: VirtualMachineClassBindingNotFound @ Machine/tkg-cluster-1-20-2-control-plane-lqwnh Severity: Error Status: False Type: ControlPlaneReady Last Transition Time: 2021-06-15T12:09:26Z Message: 0/3 Control Plane Node(s) healthy. 0/5 Worker Node(s) healthy Reason: WaitingForNodesHealthy Severity: Info Status: False Type: NodesHealthy Last Transition Time: 2021-06-14T08:33:41Z Status: True Type: TanzuKubernetesReleaseCompatible Last Transition Time: 2021-06-14T08:33:41Z Reason: NoUpdates Status: False Type: UpdatesAvailable Node Status: tkg-cluster-1-20-2-control-plane-lqwnh: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-2pbrd: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-4pf87: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-8pzdb: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-fzgpz: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-tzkth: pending Phase: failed Vm Status: tkg-cluster-1-20-2-control-plane-lqwnh: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-2pbrd: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-4pf87: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-8pzdb: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-fzgpz: pending tkg-cluster-1-20-2-workers-wlppj-778dff98c-tzkth: pending Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal PhaseChanged <invalid> vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller cluster changes from creating phase to failed phase
To successfully deploy this manifest, we would need to bind the two missing VirtualMachineClass to the namespace.
Now if we query the bindings, we should see the two bindings needed by the Tanzu Kubernetes Cluster.
$ kubectl get virtualmachineclassbindings NAME VIRTUALMACHINECLASS AGE best-effort-medium best-effort-medium 5m30s guaranteed-large guaranteed-large 16m guaranteed-medium guaranteed-medium 2m31s
$ kubectl get TanzuKubernetesCluster NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE TKR COMPATIBLE UPDATES AVAILABLE tkg-cluster-1-20-2 3 5 v1.20.2+vmware.1-tkg.2.3e10706 56m running True
To conclude, keep in mind that with the introduction of the VM Service in vSphere with Tanzu (vSphere 7.0U2a), VirtualMachineClasses need to be bound to a namespace before they can be used, which is different to how things worked in earlier versions of vSphere with Tanzu, where all namespaces had access to all classes.
Any thoughts on whether 3 “small” VMs (best-effort in test, guaranteed in prod) would generally suffice for control plane nodes? In an environment where scaling is limited, I hate to dedicate too many resources to the control plan if it can be avoided.
I’m afraid the answer is it depends. So it will depend on how many objects you deploy at the cluster level such as number of TKG clusters, PodVMs and VM deployed via vSphere service, and also how often you are deploying. I’m not aware of any documentation that helps with that sizing decision unfortunately.
Thanks for the response, Cormac – I’ve learned a lot from your blog posts. But yeah, I’m all too familiar with the “it depends” answer – I hate how often my own answers start that way! Agreed that it’s unfortunate there aren’t any documents to help with those sizing decisions. Perhaps some basic guidance for ballpark expectations from “kubectl top nodes” output would even be nice.