Disclaimer: “To be clear, this post is based on a pre-GA version of the VMware Cloud Foundation 4.0. While the assumption is that not much should change between the time of writing and when the product becomes generally available, I want readers to be aware that feature behaviour and the user interface could still change before then.”
Validate Workload Domain
The first step in the process is to validate that the workload domain can support vSphere with Kubernetes. The SDDC Manager now has a new section called Solutions. Currently, there is a single Solution called Kubernetes – Workload Management. To begin with, there are obviously no workload management solutions created:
Click on the Deploy link to get started. The first window that is displayed is the prerequisites – you need the correct license edition, a configured NSX-T based workload domain with an NSX-T Edge cluster and various subnets, 2 of which need to be routed (for Ingress and Egress).
Once you have selected all of the above, we can begin the workload management deployment. In the cluster section, select the workload domain. If there are compatibility issues, the cluster will not be displayed in the compatible section. Instead it will be displayed in the incompatible section. The incompatible view will also display the reason why the cluster is is incompatible. In the screenshot below, my WLD has not been licensed appropriately, and so is not compatible for vSphere with Kubernetes.
Once the incompatible reasons have been addressed (e.g. correct license edition applied to the VI WLD), the WLD will appear in the compatible section and you can proceed to the validation step.
And now the full validation takes place where we ensure that credentials, resources and networking all exist and are available for the rollout of vSphere with Kubernetes.
Once the requirements for vSphere with Kubernetes have been validated, we can connect to our vSphere environment to complete the vSphere with Kubernetes setup. Click the button Complete in vSphere:
vSphere with Kubernetes deployment
After clicking on the ‘Complete in vSphere‘ button, a vSphere client is launched and we get placed at the starting point to Enable Workload Management. There are 5 steps in total. The first step is to select a vSphere cluster since there could be multiple clusters in the same VI Workload Domain. In this example, there is only one, so we select that.
After selecting a cluster, the next step is to choose a control plane size in Cluster Settings. In other words, what resources will be allocated to the Supervisor Kubernetes Cluster control plane virtual machines when they are deployed on the Supervisor Cluster.
If some of this terminology such as Supervisor, Native Pods, Spherelet, etc. is new to you, you can check back on this Project Pacific post for an overview. Project Pacific was the original name for vSphere with Kubernetes.
For availability purpose, 3 x control plane nodes are provisioned, and are called Supervisor Control Plane VMs. The worker nodes, in the case of vSphere with Kubernetes, are the ESXi hosts in the cluster.
The next step is to provide network details for both the control plane nodes and various IP ranges for the workload network, such as Pods and Services. The Pod and Service CIDRs do not need to be routable but the Ingress and Egress certainly do. In the prerequisites it states that a minimum of /27 CIDR is required for both Ingress and Egress. This is a contiguous range of 32 IP addresses. Note that in my lab testing, I was able to deploy vSphere with Kubernetes with /28 which is only 16 IP addresses, of which 14 are usable. On deployment, there were 12 SNAT rules created immediately, so /28 could only be used for the most basic of deployments. Each new namespace will also require a SNAT, so you will be limited to 2 namespaces at maximum. To do something useful with vSphere with Kubernetes you will definitely need the /27 CIDR, at least for the Egress. [Update] Depending on your network configuration, you may also need to add a route between the Egress network to your DNS server to enable your Pods to be able to resolve requests to pull from google, docker or which ever external repositories you wish to use.
The next step is to select storage policies for the various disks required for vSphere with Kubernetes. Basically what you are doing in this step is picking a storage policy for the Control Plane Node disks, Ephemeral (temporary) disks, and the Image Cache.
When you click on the Select Storage, the list of available storage policies is displayed. Simply pick the one that you wish to use for the particular storage type.
Finally, review the selection and finish. This will start the deployment of vSphere with Kubernetes.
Progress can be monitored from the vSphere client, and in particular the Recent tasks.
There are a lot of actions which now take place. Some of these tasks include, but are not limited to:
- The deployment of 3 x Supervisor Control Plane VMs
- The creation of a set of SNAT rules (Egress) in NSX-T for a whole array of K8s services
- The creation of a Load Balancer (Ingress) in NSX-T for the K8s control plane
- The installation of the Spherelet on the ESXi hosts so that they behave as Kubernetes worker nodes
From NSX-T we can see the 12 x SNAT rules:
As mentioned earlier, as soon as you begin to build a namespace, a new SNAT rule is created. Thus it is important to get the Egress CIDR set correctly at deployment or you can very quickly run out of addresses.
I mentioned that there is also a Load Balancer created – this is used for the multi-node Control Plane API Server. Rather than connecting to a single control plane node, we can connect to the LB virtual IP address. This means that if there is an issue with any of the control plane virtual machines, it won’t be noticeable. Connections to the API server will simply be redirected to a different control plane node at the back-end. Here is the Ingress/Load Balancer IP address used in my environment for the Control Plane:
You can monitor all the vSphere with Kubernetes deployment activity from a log file called wcpsvc.log on the vCenter server. I normally leave a tail command running to see the detailed activity:
root@vcsa-04 [ ~ ]# cd /var/log/vmware/wcp root@vcsa-04 [ /var/log/vmware/wcp ]# ls -l total 18532 -rw------- 1 root root 23102 Mar 20 09:00 gcm-telemetry -rw-r--r-- 1 root root 9445965 Mar 20 09:02 nsxd.log -rw------- 1 root root 111 Mar 19 11:05 stdstream.log-0.stderr -rw------- 1 root root 42 Mar 19 11:05 stdstream.log-0.stdout -rw------- 1 root root 1865 Mar 19 04:06 stdstream.log-1.stderr -rw------- 1 root root 42 Mar 18 16:06 stdstream.log-1.stdout -rw------- 1 root root 3719 Mar 20 08:25 stdstream.log.stderr -rw------- 1 root root 42 Mar 19 12:20 stdstream.log.stdout -rw-r--r-- 1 root root 778392 Mar 19 12:09 wcpsvc-2020-03-19T12-09-03.220.log.gz -rw-r--r-- 1 root root 8687263 Mar 20 09:03 wcpsvc.log root@vcsa-04 [ /var/log/vmware/wcp ]# tail -f wcpsvc.log . .
And if everything goes according to plan, you should see the deployment complete, both in the vSphere client and in SDDC Manager > Solutions > Workload Management:
In the vSphere client, if we navigate to Workload Management > Namespaces in the vSphere UI, we can see a message stating that Workload Management has been successfully deployed and that we can now get started by creating namespaces and other Kubernetes objects on the Supervisor cluster. These can be your typical Kubernetes objects such as Pods, Deployments, StatefulSets, or other things such as a complete deployments of Tanzu Kubernetes Grid (TKG) Guest Kubernetes Clusters.
However that is a job for another day. Hopefully this post has shown how much automation has been put into VMware Cloud Foundation 4.0 to configure, validate and deploy infrastructure that facilitates easy deployment of vSphere with Kubernetes.
To learn more about VMware Cloud Foundation 4, check out the complete VCF 4 Announcement here.