Since the release of VMware Cloud Foundation (VCF) 4.0 over 1 month ago, I have been asked one question repeatedly – when can I run vSphere with Kubernetes (formerly known as Project Pacific) on a VCF 4.0 Consolidated Architecture? In other words, when can I deploy vSphere with Kubernetes on the Management Domain rather than building a separate VI Workload Domain to run it. The main reason for this request is because this reduces the number of ESXi hosts required to run vSphere with Kubernetes from 7 down to 4. So I am delighted to announce that we now have full support for running vSphere with Kubernetes on the Management Domain of VCF 4.0 in what we term a Consolidated Architecture.
Before we get into the “how-to”, I want to point out the different definitions when it comes to the term consolidated. The term consolidated is used to refer to running VM workloads on the Management Domain, but the term consolidated architecture is also used to refer to running a workload on the Management Domain. I want to highlight that we have always supported the former; in other words you could always run VMs on the Management Domain alongside the management VMs. This post is referring to the latter, and the ability to select the Management Domain for a vSphere with Kubernetes workload. With that clarification, let’s get on with the steps on how to deploy vSphere with Kubernetes on the Management Domain of VCF 4.0.
Deploy VCF 4.0 as normal via Cloud Builder
The initial deployment is exactly the same as posted here previously. As before, I am not including the option to deploy any AVN (Application Virtual Networks) during bringup. I will create my NSX-T Edges as a separate step later on.
Once the deployment has completed, launch SDDC Manager. When you login, you will observe a single workload domain, the Management Domain, consisting of 4 ESXi hosts. This will also be running vCenter 7.0 and has vSAN 7.0 deployed automatically. The Management Domain requires vSAN.
With VCF 4.0, NSX-T 3.0 is also deployed automatically. If you login to the NSX-T Manager, you can see the Transport Nodes and Zones for the ESXi hosts. However, since I skipped the AVN (Application Virtual Networks) during bringup, we have not yet deployed any NSX-T Edge nodes. Thus, there are no Tier0 or Tier1 Logical Routers or any other network services defined.
Add an NSX-T Edge to the Management Domain
vSphere with Kubernetes requires NSX-T to provide networking services. NSX-T provides Tier0 and Tier1 Logical Routers. NSX-T also provides SNAT egresses and Load Balancer ingresses to the Supervisor Cluster and Guest Cluster. Each namespace created in vSphere with Kubernetes gets it’s own Tier-1 Logical Router as well as any required SNAT and Load Balancer IP addresses. This functionality is provided by an NSX-T Edge, and in VCF 4.0, the provisioning of NSX-T Edges on workload domains, including the Management Domain, is fully automated.
To deploy an NSX-T Edge, simply right click on the Management Domain in SDDC Manager. From the drop-down list, select Add Edge Cluster. Full details on how to populate the wizard for an Edge cluster can be found in this earlier post.
The one item that is important to specify is that the NSX-T Edge use-case is for “Workload Management“, which basically means that this NSX-T Edge is being used for vSphere with Kubernetes. This automatically sets the Edge format factor to Large and the Tier0 Service HA to Active-Active. This step also adds the WCPReady tag to the Edge Cluster in the NSX-T Manager, which we will discuss later.
One other thing to mention is that in my lab, I do not have access to an upstream router. This means I cannot peer it to my Tier0 Logical Router through BGP. I am therefore selecting Static as my Tier0 Routing Type. This means a lot of additional manual steps when it comes to routing later on. The other option is EBGP, which requires peering the Tier0 Logical Router to your physical upstream router which in turn enables automatic route learning. EBGP would be a much more common setting in production environments, but unfortunately it is not possible in my lab. This is where Workload Management is specified in the Add Edge Cluster wizard.
After the details of both NSX-T Edge nodes have been added, complete the wizard, ensure validation passes and deploy the NSX-T Edge cluster. Now we will have to return to the NSX-T Managers to allow vSphere with Kubernetes to be deployed on the Management Domain, but first lets look at what happens if you try to deploy vSphere with Kubernetes on the Management Domain without following the additional steps.
Cluster is not compatible for Workload Management
In VCF 4.0 Consolidated Architecture, SDDC Manager does not allow you to select the vSphere cluster for validation when enabling Workload Management (deploying vSphere with Kubernetes). To proceed with the vSphere with Kubernetes deployment, you must use the vSphere client. However, from the vSphere client, if you now try to enable Workload Management, the vSphere cluster on the Management Domain does not appear as compatible.
This is because ‘trust’ has not been established on the NSX-T Manager for the Management Domain vCenter Server. Let’s do that now.
Enable Trust and Add Tags in NSX-T Manager
From the NSX-T Manager, navigate to System view. Under Configuration > Fabric, select Compute Managers. This will list the Management Domain vCenter Server. Click on the Edit link to open the vCenter Server / Compute Manager properties. Set the Enable Trust to Yes and click Save.
Now, there may be an additional step required. This depends on whether you deployed Application Virtual Networks (AVN) during the bringup operation, or if you used SDDC Manager to create the Edge Cluster, like I did here. If you deployed AVN during bringup, you will need this step. Also, if you did not select “Workload Management” as the use-case when deploying the NSX-T Edge Cluster, you will also need to do this step.
In the NSX-T Manager, navigate once again to the System view. Under Configuration > Fabric, select Nodes. Next, select Edge Clusters and click on your Edge Cluster name. Now examine the Tags. There should be two; VCF and WCPReady as shown below. If WCPReady is not present, you will need to add it using the Manage > Add steps.
You should not need to do this step if you deploy the Edge Cluster via SDDC Manager and choose “Workload Management” as the use-case.
This should now make the vSphere cluster on the Management Domain appear as Compatible in the vSphere UI when enabling Workload Management.
You can now proceed with the rest of the Workload Management deployment to roll-out vSphere with Kubernetes. The complete steps on how to do this can be found in this previous post. If everything deploys successfully, the Kubernetes API server should receive a Load Balancer IP address from your Ingress range, and appear something like this:
And if you selected static routes as the Tier0 routing type, you should now be able to connect the the Control Plane LB and download the necessary kubectl tools for vSphere with Kubernetes.
Note that the steps to establish trust between NSX-T and the Management Domain vCenter Server will be automated in a future release.
If, like me, you have gone with the static routes approach as your Tier0 Routing Type when you deployed the NSX-T Edges, you will now need to add a static route to your Tier0. This is to enable the control plane to pull container images from external repositories. You will also need to add some additional SNAT rules for the Tier0 if you do not have access to the physical networking infrastructure to make changes. It sucks not having access to your upstream router where this could be automated via EBGP, but at least there is a workaround. I’ve detailed my static routes and SNAT setup in this post.
There is another caveat if you have decided to use EBGP as your Tier0 Routing Type when you deployed the NSX-T Edges. The issue manifests itself as being unable to connect to the control plane API server Load Balancer IP address to download the tools. This is because the BGP Route Advertisements have been inadvertently blocked. To resolve this issue, you will now need to modify the Tier0 Route Advertisement configuration and create a new Custom Route Map. In the Tier0 Routing section of NSX-T, you need to do 3 additional steps:
- Create a new IP Prefix List which permits any network.
- Create a new Custom Route Map which matches the new IP Prefix created in step 1 and permits any network.
- Edit the default Route Map to use the new Custom Route Map created in step 2.
This now means that all routes will be advertised. This issue will also be addressed in an upcoming release. It is also specific to EBGP Routing Type. It is not relevant if static routes is chosen as the Routing Type.
At this point, you have successfully deployed vSphere with Kubernetes / Workload Management on the 4 node Management Domain of VCF 4.0. You can now proceed to use this environment just as you would use Workload Management deployed on a separate VI Workload Domain. You can use this environment for creating namespaces, deploying PodVMs in the Supervisor cluster and creating guest Tanzu Kubernetes Grid (TKG) clusters.
Click this link for further details on how to deploy TKG clusters with vSphere with Kubernetes.
For more information on VCF 4.0 Consolidated Architecture, and where to get a detailed white paper on how to deploy vSphere with Kubernetes on VCF 4.0 Consolidated Architecture, check out this blog post from my colleague Kyle Gleed. As Kyle states in his post, we hope this additional qualification of Cloud Foundation 4.0 Consolidated Architecture will make it easier for you to get started with running vSphere with Kubernetes.