First steps with the NSX Advanced Load Balancer (NSX ALB)

Cormac

3 years ago

As part of the vSphere 7.0 Update 2 (U2) launch, VMware now provides another Load Balancer option for vSphere with Tanzu. This new Load Balancer, built on Avi Networks technology (and previously known as Avi Vantage), provides another production-ready load balancer option for your vSphere with Tanzu deployments. This Load Balancer, now called the NSX Advanced Load balancer, or NSX ALB for short, will provide Virtual IP addresses (VIPs) for the Supervisor Control Plane API server, the TKG (guest) clusters API server and any Kubernetes applications that require a service of type Load Balancer. In this post, I will go through a step-by-step deployment of the new NSX ALB.

One question you probably have is why would or should I use the NSX ALB instead of the HA-Proxy. For one, the NSX ALB is a VMware engineered product, thus it will have gone through all of the rigorous testing and qualification that one would expect. You now have a complete VMware stack for your vSphere with Tanzu deployment, with no third-party components in the mix. Secondly, it offers a far better user-experience than the HA-Proxy, and is much more intuitive to configure and monitor, as you will see in the post. Lastly, there is full visibility into which components of vSphere with Tanzu are using the Load Balancing service through the NSX ALB UI, something we did not have with the HA-Proxy appliance.

The NSX ALB is available as an OVA. In this deployment, I am using version 20.1.4-2p1. The NSX ALB can be downloaded from this portal. The only information required at deployment time are (a) a static IP Address, (b) a subnet mask, (c) a default gateway and (d) an SSH login authentication key from the user/system that wishes to be able to securely login to the appliance after deployment. The one thing to note is that the NSX ALB requires considerably more resources than the HA-Proxy appliance. The HA-Proxy requires 2 CPUs, 4GB Memory and a 20GB disk. The NSX ALB requires 8 CPUs, 24GB Memory and a 128GB disk. Note also that the NSX ALB provisions additional appliances known as Service Engines to run the Virtual Services, which we will come back too later in the post. These Service Engines are a little more lightweight, with each SE requiring 1 CPU, 2GB Memory and a 15GB disk. So plan your resource management accordingly.

Once the NSX ALB is deployed, give it a few minutes to initialize and you should be able to connect a browser to the configured IP address of the appliance to access the management portal.

Step 1 – Network Configuration Planning

Before deploying the appliance, you should ideally begin by making a note of (a) the sort of network topology you wish to deploy and (b) the ranges of IP addresses that you need to set aside for your vSphere with Tanzu deployment. Just like we had with the HA-Proxy, you can use different segments of the same network for both the FrontEnd/Load Balancer/Virtual IP (VIP) network and the Workload network. An alternative option is to place the VIPs and workload nodes on two different networks.To recap, the workload network is used by the vSphere with Tanzu Supervisor cluster nodes and the TKG (guest) cluster nodes, whilst the VIP is used by the API server of the different clusters.

In my setup, I have a shared network for the VIPs and the workload network. You can visualize the setups as looking similar to the following (note that these are simplified views since I am not making a distinction between the NSX Advanced Load Balancer appliance and the Service Engines). In the diagram below, the NSX ALB is in a deployment with separate Frontend/VIP and Workload networks. Note that there must be a route between the the VIP and Workload networks when this type of deployment is implemented.

Here is the other deployment method, which uses a range of IP addresses on the same network to meet the requirements of the VIP addresses and the Workload IP addresses. This is the configuration that I will use in my setup:

There are the requirements that you need to consider before you begin:

1 Static IP address for the NSX ALB on the Management Network, mentioned earlier
Range of X number of static IP Addresses for the Service Engines on the Management Network
Range of 5 static IP addresses for the Supervisor Control Plane nodes on the Management Network
Range of Y number of Load Balancer VIPs on the FrontEnd/VIP network
Range of Z number of IP Addresses for the Supervisor Control Plane nodes and the TKG (guest) cluster nodes on the Workload network

The ranges X, Y and Z above are determined by the administrator and will most likely depend on the size and number of guest clusters and load balancer service applications that are deployed. With this information captured, we can now start the configuration.

Step 2 – NSX Advanced Load Balancer Configuration

The very first step is to create an administrator account. Provide a password for admin and optional email address, then click on Create Account. Note the AVI version is also displayed here.

Next, provide some DNS entries and a backup passphrase.

Scrolling down on the same window, you will need to provide an NTP server. You can choose to keep the default NTP servers (from us.pool.ntp.org) or provide your own. The UI allows you to delete the default ones.

Next, decide if you want to setup SMTP notifications. Set it to None if not.

Next step is to select the Orchestrator Integration. Select VMware:

Provide your vCenter credential next. Leave Permissions set to Write (default) and SDN Integration to None (default). This creates a new cloud configuration in the NSX ALB called Default-Cloud. This is the only cloud configuration supported by vSphere with Tanzu and the NSX ALB. We will see this configuration in more detail shortly.

When you successfully connect to your vCenter server, select your Data Center from the drop down list. I am using Static IP Address Management, and I am also not setting up any static routes. Note that you may need to implement static routes if you decide to go with a separate VIP and workload network, as mentioned previously. This is because the Service Engines are only plumbed up on the VIP network so will need a static route added to tell them how to reach the Workload network. This is why there needs to be a route between the VIP and Workload networks. However, since my setup is using segments on one flat network for VIPs and Workloads, I do not need to add anything here.

This next window is where we define the IP address pool for the Service Engines. Select the Management network from the drop-down, noting that the network selected must be able to reach the NSX ALB management address. I put both the NSX ALB and SEs on the same distributed portgroup – VL530-DPortGRoup. After that, it is simply a matter of populating the subnet using a CIDR format, a hyphen-separated address range for the Service Engines and the gateway information. I have allocated 8 static IP addresses to the Service Engines in my setup. You may decide you need a different address range in your production environment.

The final step is the Tenant Settings. Set this to No. vSphere with Tanzu and NSX ALB do not support multiple tenants in this version.

And at this point, the initial setup is complete. You will now be placed into the NSX Advanced Load Balancer portal, something like what is shown below:

Step 3 – Additional NSX Advanced Load Balancer Setup

There are a number of additional tasks that now need to be implemented in the Load Balancer before we can enable Workload Management for vSphere with Tanzu. These tasks can be summarized as follows:

Enable Basic Authentication
Create a Self-Signed SSL/TLS Certificate, used when Workload Management is enabled
Install the License
Configure the Service Engine Group
Configure the Service Engine Network
Configure the VIP Network
Create a new IPAM Profile for assigning VIPs to Virtual Services
Add new IPAM Profile to Default-Cloud
Export the Self-Signed SSL/TLS Certificate

Let’s take each of these tasks and show how to implement them.

Step 3.1 – Enable Basic Authentication

In the top left-hand corner of the NSX ALB portal, there are 3 parallel lines. Click on this to see the drop-down menu, which includes Applications, Operations, Templates, Infrastructure and Administration. Select Administration. Across the top of the browser, you should see items such as Accounts, Settings, Controller, etc. Select Settings. This should give a sub-menu of items such as Authentication/Authorization, Access Settings, DNS/NTP and so on. Select Access Settings. On the right hand side, there should be a pencil icon. Click on that to edit the System Access Settings. Finally check on the “Allow Basic Authentication” checkbox, as shown below. Change from:

To:

Step 3.2 – Create a Self-Signed SSL/TLS Certificate

Staying in the System Access Settings, we will now create a new self-signed certificate. You could of course import your own signed certs, but in my case, I am going to go with a self-signed one for convenience. This certificate will need to be provided when setting up Workload Management in vSphere later on.

Under the SSL/TLS Certificate section, delete any certificates that already exist from the installation.In my setup, there were called System-Default-Portal-Cert and System-Default-Portal-Cert-EC256.

After deleting those original SSL/TLS Certificates, the System Access Settings should now look as follows:

Next, click on the drop-down for the SSL/TLS Certificate and select the Option to Create Certificate.

Provide a Name, ensure that the Type is set to Self Signed, give it a Common Name which should be the FQDN of the NSX ALB and finally provide it with a Subject Alternate Name (SAN) which is the same as the IP address of the NSX ALB. The Algorithm, Key Size and Days Until Expiry can be left at the default. More details about certificate management can be found in the official docs. Once the self-signed cert has been created, you can Save it, and then Save the updated System Access Settings. After changing the self-signed certificate, you will need to refresh your browser.

Step 3.3 – Install the License

The Licensing section is also found in the Administration > Settings section. You can install it via a key, or upload it as a file.

One applied, the new license should be displayed in the list of licenses on this screen.

Step 3.4 Configure the Service Engine Group

OK – most of the housekeeping has now been done. It is time to move onto the rest of the configuration settings. From the main ‘three-bar’ icon in the top left-hand corner, select Infrastructure from the list. This will change the list of menu items across the top of the window. In this list, you should see Service Engine Group. This should display the Service Engine Group called Default-Group. Click on the pencil icon on the right-hand side to edit. There are two configuration screens, Basic Settings and Advanced. In the Advanced screen, you can select vSphere objects for Service Engine placement, such as Cluster, Host and Data Store. The only changes I made here was to select my cluster and to set the Data Store to be shared, as follows:

Everything else I left at the defaults. However it is possible to do much more advanced configuration settings in here, such as controlling the number of Service Engines that can be deployed for availability purposes (HA Mode), the number of Virtual Services that can be run on an SE, and to what scale they can grow. There is a lot of information about HA and Placement in the official AVI documentation about these considerations which are outside the scope of this blog post.

Step 3.5 – Verify the Service Engine Network

From the Infrastructure view, select Networks. Here you should see all the networks from your vSphere environment. This is what it looks like in my environment, where VL530-DPortGroup is my Management network and VL-547-DPortGroup will be used for my combined VIP/workload network.

As part of my initial deployment, I chose a management network and range of IP addresses for the Service Engines. These were configured on the Management Network, and we can see that there is a configured subnet with 8 IP addresses available for the Service Engines above. To examine it closer, you can click on the + sign associated with the Management Network. This should display something like this:

The network has been configured with a Static IP Pool. To look closer, click on the pencil icon to the right of the network. This will show the network settings. To see the Address Pool settings, click on the pencil icon to the right of the IP Address Pool. It should then show something like the following:

Everything looks configured. Great! Click Cancel, and Cancel again. The IP address range for the Service Engines on the Management Network has been correctly defined.

Step 3.6 – Configure the VIP Network

Staying in the Networks view, we are now going to go through a similar process, but this time for the Load Balancer IP addresses / Virtual IP addresses (VIPs). These are the IP addresses used by the various Kubernetes control planes and Kubernetes applications that require a Load Balancer Service. From the list of networks, select the one where the VIPs will reside. Click the pencil icon once again to edit the settings.

Now click on the + ADD Subnet button. Add the IP subnet, and add the network subnet using a CIDR format. Next, click on the pencil icon in that view to + Add Static IP Address Pool.

Now add the range of VIP addresses that you wish to assign to control planes and load balancer services. In my environment, I am keeping this quite small as I have only allocated a range of 8 VIPs. Again, this is a tiny range of VIPs and you may want to consider a far larger range of VIPs for your production environment.

Note the CIDR format used for the IP subnet. The range of addresses must be a subset of the network CIDR. I have only been granted a segment of this network for my own personal use. Note that I will need to take great care when enabling Workload Management on vSphere later on, and ensure that I use the same subnet mask for the workload network address range. This is so that objects on the VIP network and workload network can communicate successfully. Click Save, and Save again. Now the IP address range for the VIPs on the FrontEnd/VIP Network have been defined. The network view should now look something like the following, with both the Service Engine and VIP networks with configured subnets.

3.7 Create a new IPAM for assigning VIPs to Virtual Services

IPAM will be used to assign VIPs to Virtual Services, such as the Kubernetes control planes and applications mentioned previously. Click on the menu in the top left-hand corner, and from the drop-down list, select Templates. Across the top, ensure that Profiles is selected, and then select the sub-item IPAM/DNS Profiles. Click on the blue Create button on the right and select IPAM Profile. Set the name of the profile to Default-IPAM, and leave the type set to Avi Vantage IPAM. Next, click on the + Add Usable Network. Here, set the Cloud for Usable Network to Default-Cloud and set the Usable Network to the port group used for the VIPs earlier, in my case VLAN574-DPortGroup. My setup looks something like this:

Click on Save to save the new IPAM Profile.

3.8 Add new IPAM Profile to Default-Cloud

From the main menu, select the Infrastructure section, select the Clouds option. This should reveal the Default-Cloud which is the only cloud option supported for vSphere with Tanzu and the NSX ALB.

Once again, click on the pencil icon to edit the cloud configuration. In the Infrastructure window, down at the bottom, find the IPAM Profile and, from the drop-down, select the Default-IPAM created earlier. This now means that any virtual services that are requested and spun up will get their IP addresses from the Default-IPAM VIP range. Save the cloud configuration.

3.9 Export the Self-Signed Certificate

This can be downloaded from Templates > Security > SSL/TLS Certificates. There is an icon in the right most field which will export the certificate. This will allow you to copy the certificate to your clipboard. That completes the setup. We can now proceed with enabling vSphere with Tanzu Workload Management.

Step 4 Enable Workload Management

I’m not going to go through the full deployment of workload management, step by step. However, there are a few steps that are different to previous posts, and are worth mentioning. The main difference is that the Load Balancer Network now uses Avi as the Load Balancer type. Note that there is no load balancer IP Address range included; this is now provided by the NSX ALB. Lastly, the Certificate asked for in the Load Balancer Network configuration is the self-signed one we created previously in step 3.9.

The Management Network above looks for the starting IP address in the range of 5 IP addresses that are used by the Supervisor Control Plane nodes.

The Workload Network defines the range of IP addresses used by the Supervisor nodes and TKG cluster nodes. Note that the subnet matches exactly what we used for the VIP network in the NSX ALB, although it is a different range. This is because, as mentioned a few times now, they are sharing the same network. But nodes on each network need to be able to communicate to each other. Because they are on the same subnet, I did not have to do anything with static routes. This would be required if the VIP and workload network were separate.

You can now assume that the Workload Management deployment was successful. This will in-turn have triggered the creation of two (by default) Avi Service Engines which should be visible in the vSphere inventory. The Supervisor Control Plane will have requested, and been allocated, an IP address from the VIP range, which it has:

We can visualize the following network configuration based on Supervisor cluster being successfully deployed. The Supervisor nodes will be on both the Management network and Workload network, and also have a Load Balancer IP address for its API server on the FrontEnd/VIP network. The SV nodes get addresses on the Management Network, starting at the IP address defined in Workload Management earlier. The nodes also get IP addresses on the Workload Network, the range of which was also defined in Workload Management earlier. Finally, the control plane for the Supervisor cluster is allocated a VIP IP address from the VIP range that was configured in the NSX ALB configuration previously.

One thing to note is that even though the Supervisor control plane was allocated a VIP, it may not be possible to login to the cluster immediately. You will need to wait for the Virtual Service /Service Engine to finish initializing. The status/health can be checked from the NSX ALB portal.

To complete my testing, I also deployed a TKG guest cluster in a vSphere with Tanzu namespace, and its control plane API server is also allocated a VIP address:

We can visualize the TKG deployment as follows, where the nodes are allocated addresses from the workload network range, and the API server is allocated an address from the VIP range.

But the really nice thing about the new NSX ALB over the previous HA-Proxy is that we now have full visibility into the health and status of the appliance, and who is consuming the VIP addresses. For example, here is a view of the Infrastructure > Service Engines that are providing the VIPs.

And here is a view into the who is consuming the Virtual Services via the Applications > Virtual Services view; in this case, one is the Supervisor Control Plane API server, and the other is the TKG Control Plane API Server.

In conclusion, the new NSX Advanced Load Balancer is far superior to the HA-Proxy in many ways. The user experience with the deployment has improved significantly, and even though the configuration requires a few additional steps, it is not too complicated to setup. The visibility provided into health and usage of the virtual services are going to be extremely beneficial for day-2 operations, and should provide some great insights for those administrator who are responsible for provisioning and managing Kubernetes distributions running on vSphere. As you can probably tell, there are many more configuration options available in the NSX ALB. For further details, check out the AVI Vantage installation guide (AVI Vantage was the original name).

One final piece of information. Note that the NSX ALB only provides a Load Balancer service. It does not provide an network overlay functionality. This means that there is still no support in the Supervisor cluster for PodVMs at the time of writing. If you wish to use PodVMs, and also Supervisor services such as the vSAN Data Persistence platform, you will require a full NSX-T deployment.