Enabling vSphere with Tanzu using HA-Proxy

In earlier posts, we look at the differences between the original “VCF with Tanzu” offering and the new vSphere with Tanzu offering from VMware. One of the major differences is the use of HA-Proxy to provide a load balancing service, and the deployment steps of the HA-Proxy we covered in detail in a follow-up post. In this post, we are now ready to deploy vSphere with Tanzu, also known as enabling Workload Management.

Prerequisites Revisited

The prerequisites were covered in detail in the “Getting started” post, and you won’t have been able to successfully deploy the HA-Proxy without following them. There are two prerequisites which are required when enabling workload management, so let’s revisit those. First, make sure that you have the appropriate storage policy for the Supervisor control plane VMs created, and, second, ensure that a Content Library with the TKG images subscription URL in place. Navigate to Workload Management in the vSphere Client UI and click on Get Started, as shown below:

The first requirement is to select a networking stack. Whilst you can continue to use NSX-T with vSphere with Tanzu, we are going to go with the vCenter Server Network, meaning we will be using a vSphere Distributed Switch (VDS). Remember however, as pointed out in previous posts, use of the vCenter Server Network (VDS + HA-Proxy) precludes you from using the PodVM service.

Next, select the cluster on which you wish to install vSphere with Tanzu. I have only one cluster in my environment, so that is the only option available.

Now you need to select a control plane size. I don’t really have any advice to offer here on how to size the control plane at this point. My guess is that this will come in official documentation. I chose small, as I don’t plan to do much other than deploy a simple TKG cluster or two. Resource details are shown against each size.

Next, select a storage policy for the control plane disks. Since I use vSAN for the underlying datastore, I am simply selecting the vSAN default policy from the drop-down list of policies:

Next up is to step to configure the Load Balancer on the Frontend network. This is where we tell vSphere with Tanzu about HA-Proxy. Note that the Name should be very simple (don’t use any special characters). Type is obviously HA Proxy. The Data path API address is the IP address of the HA Proxy on the management network plus the Dataplane API management port (default 5556) so in my setup this was 10.27.51.134:5556. User name and password are what was provided when we provisioned the HA Proxy previously. The IP address Ranges for Virtual Servers is the range of Load Balancer IP addresses we provided when configuring the HA-Proxy – 192.50.0.176/29 – which provides 8 load balancer IP addresses ranging from 192.50.0.176-192.50.0.183. Note you must provide the range and not a CIDR in this case. Lastly we need the Server Certificate Authority. This can be found by SSH’ing as root to the HA-Proxy appliance, and copying the contents of the /etc/haproxy/ca.crt file to here. Note however that I have inadvertently used the /etc/haproxy/server.crt as well, and this has seemed to have worked as well. However, I’ve been  informed that it is preferable to use the ca.crt, since that is the CA that was used to sign the actual certificate (server.crt) that the DataPlane API endpoint on HAProxy will serve.

Now we setup the Management network. These are the IP addresses that will be used by the Supervisor Control Plane VMs. You will need to provide a starting IP address, but you should provide a minimum of 4. A 3 node Supervisor control plane will requires at least this many. However, it would be useful to make sure there are even more IP addresses available in the range for the purpose of patching, upgrades, etc. Official documentation should provide guidance on this. The rest of the fields here, such as NTP, DNS and Gateway, are self-explanatory.

The final network that needs to be configured is the Workload network. This network is used by both the Supervisor control plane nodes and the TKG “guest” cluster nodes. You will notice that the Supervisor control plane nodes get a second network interface plumbed up, connecting them to the portgroup of the workload network. On completion of the setup, the Supervisor control plane VMs should have network interfaces on both the management network and the workload network.

The IP address range for Services can be left at the default, but you will need to click the ADD button to add the workload network. Select the portgroup for the workload network, provide gateway, subnet mask and a range of IP addresses that can be used for the network. I provided a range of 16 free IP addresses.

There is the option to create additional workload networks, but I am only creating one.Once saved, the workload network should look something like this.

Next, select the Content Library that holds the TKG images. This should have already been created, as it was called out in the prerequisites. In my setup, I called the Content Library Kubernetes. This needs to be synchronized to the TKG image subscription URL – https://wp-content.vmware.com/v2/latest/lib.json. This Content Library will automatically be available in the vSphere with Tanzu Namespaces that we will create later, once vSphere with Tanzu is up and running. If you are setting up vSphere with Tanzu in an air-gapped/dark site, there is a documented procedure on how to setup the Content Library in an Air-Gapped environment.

Finally everything is in place to start enabling workload management / vSphere with Tanzu. Click the Finish button.

You should now observe that the cluster starts to configure:

A lot of configuration steps now start to take place, such as deploying the Supervisor cluster control plane VMs, and plumbing them up onto both the management network and the workload network. The control plane API server should also get a load balancer IP address allocated from the configured range of IP addresses on the frontend network. If you want to trace the log output for a deployment, SSH onto the vCenter server, navigate to /var/log/vmware/wcp and run a tail -f wcpsvc.log. Note that this generates a lot of logging, but might be useful in identifying the root cause of a failure. If the deployment completes successfully, you should see the Control Plane IP address configured with one of the Load Balancer / frontend IP address range. In fact, in my case, it is the first IP address in that range – 192.50.0.176.

And now you should be able to connect to the Control Plane IP address and see the Kubernetes CLI Tools landing page.

Success! We have deployed vSphere with Tanzu with a HA-Proxy, which has been able to provide a load balancer IP address to our Supervisor cluster control plane. From a networking perspective, this is how my setup looks at the moment, now that the Supervisor control plane (virtual machines SV1-3) has been deployed.

At this point, we have looked at the prerequisites for getting started with vSphere with Tanzu. We have also looked at the deployment of HA-Proxy.  In this post, we have covered how to deploy/configure Workload Management/vSphere with Tanzu. In my final post, I will cover the remaining tasks which will include how to create a Namespace, how to login to vSphere with Tanzu and how to deploy a TKG cluster. Stay tuned!

21 Replies to “Enabling vSphere with Tanzu using HA-Proxy”

  1. Hey Cormac,

    Hope you are doing great,

    I’ve been following this amazing blog,

    Somehow I am not able to open the Control Plane Node IP Address web console,

    It tries to bring up the webconsole but it times out,

    The ip is reachable and the supervisors with the IP from the same range show their web console fine.

    Don’t know if you have an idea about it

  2. Cormac,

    Regarding image “2.-select-a-network-stack.png”, if you already had NSX-T implemented in vCenter, would NSX-T be an available option here? If not, what makes it be available as an option?

    Additional question, does the “Enterprise Plus with Kubernetes” license or the “add-on for Kubernetes” license enable both options (NSX-T and vCenter Server Network) assuming you have additional licenses for NSX-T? Just checking my understanding that both NSX-T and vCenter Server Network are both options included in vSphere Tanzu.

    1. Hi Richard,

      Yes – if you have NSX-T, you can absolutely use that for the underlying network platform for vSphere with Tanzu. You do not need to deploy a HA-Proxy in that case, and you also can leverage the PodVMs feature and Harbor Image Registry.

      I’m not 100% clear on the licensing, but I believe there is a new license with vSphere 7.0U1 to allow you to enable vSphere with Tanzu. This should be in the docs, or if not, please reach out to your closest VMware rep for further details.

  3. Hello Cormac,
    Thanks a lot for your blog post ! Quick question before starting the deployment. If I choose vCenter Server Network, will I be able to go back to NSX-T ? At which level do you enable the vCenter Server Network ? vSphere Cluster, vCenter ? Or even Linked vCenters ?

    Thanks

    1. Hi Tristan – VDS is enabled at the vSphere cluster level, same as vSphere with Tanzu. I don’t believe you can move between the HA-Proxy and NSX-T network providers though (as per my other response). I think it is one or the other.

  4. Hello,
    thanks for this tutorial. I test this in a nested lab and i end up with an error when enabling the workload management.
    download remote files: HTTP communication could not be completed with status 404.

    My ha proxy and my supervisor are on the same network as my esxi and vcenter.
    I connect to the ha proxy and everything seems fine about connection.

    What i saw it’s that i have this error but it end up by deploying the supervisor without IP and after a while it destroy the supervisor and deploy another one and that never end.

    Did you see this error?

    What are those remote file it try to download

    regards

    1. Hi Anthony – no, I did not see this error, but I know the networking is a bit tricky with the HA-Proxy.

      The “HTTP communication could not be completed with status 404” are transient and can be ignored.

      I think there must be another reason for the Supervisors to be deployed without IP addresses. I’m not sure what that would be, but I would speak to GSS and see if they can help.

    2. Hi! First Thanks for the excellent tutorials! I have the same Problem, 404 and then never ends up… (also nested), did you find the reason? Thank a lot

      1. 404 errors in the task bar can be ignored Paul. What happens with the configuring step? Does it just seem to go on forever? Do the SV VMs get deployed, deleted and then redeployed?

        It is hard for me to answer questions about what happens in a nested setup as I have not tested it I am afraid.

        1. Hi Thanks for your answer! Now it works. Nested is not the problem. The problem might come from different networks. My first test (that failed) had three different network, and important the MGMT Network was a different Layer 3 network than the ESX hosts. I than did the same Lab as you (ESX and MGM same network, Frontend and Workload on same (different) VLAN and now it works! Have to do further test to understand exactly why it did not work before … The important question would be: Can the MGMT network (for tanzu) be on a differnet network (routable) than the ESX Hosts or should/must it be the same? Thank you agin for the very good tutorials!

          1. In my setup, my management network is also on a different network (VLAN) to my workload and frontend. However, I was able to successfully deploy when the management and worklaod/frontend were routable to each other, and again, when they were not.

            However in all cases, the management network was the same one shared with vCenter server and the ESXi hosts management interface.

  5. Hi Cormac, i have tried to following your procedure to deploy vsphere with tanzu in my hole lab. During cluster selection step i have found my cluster in incompatible status with next incompatibility reason: Cluster domain-c2001 is a personality-manager managed cluster. It currently does not support vSphere namespaces.

    Any ideas what can cause this issue? I have tried to find anything related in knowledge base with no success.
    What does it mean – personality-manager managed cluster?

    Thanks an advance

        1. Hi Alexander

          i had the same error as you and i have single image config on the cluster, i created another cluster without single image setup for the cluster, that cluster was compatible so i think the single image config is blocking the Tanzu setup for that cluster.

          BR
          Ingvar

  6. Hello Cormac, I tried this in my nested lab and it is failing with an error while enabling workload mgmt.
    Supervisor control plane failed: No connectivity to API Master: connectivity Get https://10.155.124.67:6443/healthz?timeout=5s: dial tcp 10.155.124.67:6443: connect: connection refused, config status ERROR.

    From VC the connection is refused to API master
    curl -v telnet://10.155.124.67:6443
    * Rebuilt URL to: telnet://10.155.124.67:6443/
    * Trying 10.155.124.67…
    * TCP_NODELAY set
    * connect to 10.155.124.67 port 6443 failed: Connection refused
    * Failed to connect to 10.155.124.67 port 6443: Connection refused
    * Closing connection 0
    curl: (7) Failed to connect to 10.155.124.67 port 6443: Connection refused

    And one more things I noticed is the Server Certificate Authority file, as per https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-8D7D292B-43E9-4CB8-9E20-E4039B80BF9B.html it is advised to use /etc/haproxy/ca.crt. Should we use server.crt or ca.crt for this ?

    1. I really don’t know what the issue could be Sushil, but thanks for the note about the ca.crt. It seems both will work, but the advice should be to use the ca.crt, which is used to sign the server.crt.

      I’ve updated the post to reflect the official documentation – thanks for catching.

    1. Is there a route from the workload domain to the load balancer/frontend network? Firewall issues? Promiscuous mode not set? These are just some guesses. It is impossible for me to troubleshoot issues in the comments of this post.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.