TKG v1.3 and the NSX Advanced Load Balancer
In my most recent post, we took a look at how Cluster API is utilized in TKG. Note that this post refers to the Tanzu Kubernetes Grid (TKG) multi-cloud version, sometimes referred to as TKGm. I will use this naming convention to refer to the multi-cloud TKG in this post, so that it is differentiated from other TKG products in the Tanzu portfolio. In this post, we will take a closer look at a new feature in TKG v1.3, namely the fact that it now supports the NSX ALB – Advanced Load Balancer (formerly known as AVI Vantage) – to provide virtual IP addresses for applications that utilize a load balancer service. I have already documented the steps on how to integrate the NSX ALB with vSphere with Tanzu. While the setup of the NSX Advanced Load Balancer for TKGm is very similar, the workflow is slightly different. The is a new installer workflow with with NSX ALB version 20.1.5 compared to version 20.1.4 deployed previously. There are also some additional steps in the TKG management cluster installer UI to accommodate the integration with the NSX ALB. We will cover those in this post.
Before we begin, it is important to highlight that TKGm continues to use Kube-VIP to provide a front-end virtual IP addresses for both the management cluster API server and the workload cluster API server. The NSX ALB, when integrated with TKGm, provides virtual IP addresses for applications that required a load balancer service. Thus, in the configuration file for both the management cluster and the workload cluster, a vSphere IP address endpoint for the respective cluster is specified. These IP addresses are considered static and must be outside of your DHCP IP address range.
NSX ALB Deployment
The deployment of the NSX ALB is identical for the most part to the deployment steps already outlined in the vSphere with Tanzu blog post. The big difference in version 20.1.5 is that after power on of the appliance and the initial configuration of the System Settings, Email/SMTP and Multi-Tenant configuration, you can launch directly into the Setup Cloud After step for VMware vCenter/vSphere ESX, which appears on the bottom of the Welcome UI, as shown here:
The remaining steps to configure the NSX ALB are identical to those outlined previously, where details about vCenter and the vSphere environment are added, a certificate is created, a network and address range for both NSX ALB Service Engines and Load Balancing VIPs is chosen, an IPAM profile is created, and so on. Since these steps are already available, we will not repeat them here.
TKG Management Cluster
The deployment of the TKG management cluster has also been covered in detail in the Cluster API blog post. There is one additional piece that is relevant to us, and that is the inclusion of the NSX ALB section. In the TKG management cluster creation UI, this new (optional) NSX ALB section looks as follows, currently only partially populated:
The resulting manifest file for creating a TKG management cluster would then look something like this. This file is created in the $HOME/.tanzu/tkg/clusterconfigs folder on the host where the TKGm UI installer is launched. Note that there is no LDAP or OIDC configured in this setup. Most of the NSX ALB configuration is at the beginning of the manifest.
AVI_CA_DATA_B64: LS0........ AVI_CLOUD_NAME: Default-Cloud AVI_CONTROLLER: 10.35.13.40 AVI_DATA_NETWORK: VL3513-PRIV-DPG AVI_DATA_NETWORK_CIDR: 10.35.13.0/24 AVI_ENABLE: "true" AVI_LABELS: "" AVI_PASSWORD: <encoded:Vk13YXJlMTIzIQ==> AVI_SERVICE_ENGINE_GROUP: Default-Group AVI_USERNAME: admin CLUSTER_CIDR: 100.96.1.0/11 CLUSTER_NAME: tkgm-lb CLUSTER_PLAN: dev ENABLE_CEIP_PARTICIPATION: "false" ENABLE_MHC: "true" IDENTITY_MANAGEMENT_TYPE: none INFRASTRUCTURE_PROVIDER: vsphere LDAP_BIND_DN: "" LDAP_BIND_PASSWORD: "" LDAP_GROUP_SEARCH_BASE_DN: "" LDAP_GROUP_SEARCH_FILTER: "" LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: "" LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN LDAP_HOST: "" LDAP_ROOT_CA_DATA_B64: "" LDAP_USER_SEARCH_BASE_DN: "" LDAP_USER_SEARCH_FILTER: "" LDAP_USER_SEARCH_NAME_ATTRIBUTE: "" LDAP_USER_SEARCH_USERNAME: userPrincipalName OIDC_IDENTITY_PROVIDER_CLIENT_ID: "" OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: "" OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: "" OIDC_IDENTITY_PROVIDER_ISSUER_URL: "" OIDC_IDENTITY_PROVIDER_NAME: "" OIDC_IDENTITY_PROVIDER_SCOPES: "" OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: "" SERVICE_CIDR: 100.64.1.0/13 TKG_HTTP_PROXY_ENABLED: "false" VSPHERE_CONTROL_PLANE_DISK_GIB: "40" VSPHERE_CONTROL_PLANE_ENDPOINT: 10.35.13.240 VSPHERE_CONTROL_PLANE_MEM_MIB: "8192" VSPHERE_CONTROL_PLANE_NUM_CPUS: "2" VSPHERE_DATACENTER: /CH-OCTO-DC VSPHERE_DATASTORE: /CH-OCTO-DC/datastore/vsanDatastore VSPHERE_FOLDER: /CH-OCTO-DC/vm/TKG VSPHERE_NETWORK: VL3513-PRIV-DPG VSPHERE_PASSWORD: <encoded:QWRtaW4hMjM=> VSPHERE_RESOURCE_POOL: /CH-OCTO-DC/host/CH-Cluster/Resources VSPHERE_SERVER: 10.35.13.116 VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAA....== cormac VSPHERE_TLS_THUMBPRINT: 56:C1:C5:47:FF:00:C0:68:97:FF:A5:14:6B:0E:37:65:3C:CF:48:90 VSPHERE_USERNAME: administrator@vsphere.local VSPHERE_WORKER_DISK_GIB: "40" VSPHERE_WORKER_MEM_MIB: "8192" VSPHERE_WORKER_NUM_CPUS: "2"
By using the tanzu management-cluster create command, we can roll out the management cluster using the above configuration file. I have included the -v 6 option so that the output is more verbose. Note that once the installer detects vSphere 7, it offers the option of setting up vSphere with Tanzu rather than TKGm. You need to respond appropriately to continue with the TKGm management cluster deployment. This goes through the typical Cluster API deployment model of creating a very small kind (Kubernetes in Docker) cluster, adding the Cluster API provider extensions to the kind cluster, and then using those providers to build a TKG management cluster backed by VMs on vSphere. Once the management cluster is up and running on VMs, the context is switched to this cluster and the kind cluster is removed. Here is the complete output.
$ tanzu management-cluster create --file ./z2l657j5bm.yaml -v 6 CEIP Opt-in status: false Validating the pre-requisites... vSphere 7.0 with Tanzu Detected. You have connected to a vSphere 7.0 with Tanzu environment that includes an integrated Tanzu Kubernetes Grid Service which turns a vSphere cluster into a platform for running Kubernetes workloads in dedicated resource pools. Configuring Tanzu Kubernetes Grid Service is done through the vSphere HTML5 Client. Tanzu Kubernetes Grid Service is the preferred way to consume Tanzu Kubernetes Grid in vSphere 7.0 environments. Alternatively you may deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0. Note: To skip the prompts and directly deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0, you can set the 'DEPLOY_TKG_ON_VSPHERE7' configuration variable to 'true' Do you want to configure vSphere with Tanzu? [y/N]: N Would you like to deploy a non-integrated Tanzu Kubernetes Grid management cluster on vSphere 7.0? [y/N]: y Deploying TKG management cluster on vSphere 7.0 ... Identity Provider not configured. Some authentication features won't work. no os options provided, selecting based on default os options Setting up management cluster... Validating configuration... Using infrastructure provider vsphere:v0.7.7 Generating cluster configuration... Setting up bootstrapper... Fetching configuration for kind node image... kindConfig: kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 kubeadmConfigPatches: - | apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration imageRepository: projects.registry.vmware.com/tkg etcd: local: imageRepository: projects.registry.vmware.com/tkg imageTag: v3.4.13_vmware.7 dns: type: CoreDNS imageRepository: projects.registry.vmware.com/tkg imageTag: v1.7.0_vmware.8 nodes: - role: control-plane extraMounts: - hostPath: /var/run/docker.sock containerPath: /var/run/docker.sock containerdConfigPatches: - |- [plugins."io.containerd.grpc.v1.cri".registry.configs."cormac-tkgm.corinternal.com".tls] insecure_skip_verify = true Creating kind cluster: tkg-kind-c33lkn8r994jb6dpv2p0 Ensuring node image (cormac-tkgm.corinternal.com/library/kind/node:v1.20.5_vmware.1) ... Image: cormac-tkgm.corinternal.com/library/kind/node:v1.20.5_vmware.1 present locally Preparing nodes ... Writing configuration ... Starting control-plane ... Installing CNI ... Installing StorageClass ... Waiting 2m0s for control-plane = Ready ... Ready after 35s Bootstrapper created. Kubeconfig: /home/cormac/.kube-tkg/tmp/config_SxcKk9Ia Installing providers on bootstrapper... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system" installed Component=="cluster-api" Type=="CoreProvider" Version=="v0.3.14" installed Component=="kubeadm" Type=="BootstrapProvider" Version=="v0.3.14" installed Component=="kubeadm" Type=="ControlPlaneProvider" Version=="v0.3.14" installed Component=="vsphere" Type=="InfrastructureProvider" Version=="v0.7.7" Waiting for provider infrastructure-vsphere Waiting for provider cluster-api Waiting for provider control-plane-kubeadm Waiting for provider bootstrap-kubeadm Waiting for resource capi-kubeadm-control-plane-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capi-kubeadm-control-plane-controller-manager' in namespace 'capi-kubeadm-control-plane-system', retrying Waiting for resource capv-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying Waiting for resource capi-controller-manager of type *v1.Deployment to be up and running Waiting for resource capi-kubeadm-bootstrap-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capi-kubeadm-control-plane-controller-manager' in namespace 'capi-kubeadm-control-plane-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying Waiting for resource capi-kubeadm-control-plane-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider control-plane-kubeadm after 10.14984304s Waiting for resource capv-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capv-controller-manager' in namespace 'capi-webhook-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying Waiting for resource capi-kubeadm-bootstrap-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-webhook-system', retrying Passed waiting on provider infrastructure-vsphere after 15.238228291s pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying Passed waiting on provider bootstrap-kubeadm after 15.241068789s Waiting for resource capi-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider cluster-api after 20.256005572s Success waiting on all providers. Start creating management cluster... patch cluster object with operation status: { "metadata": { "annotations": { "TKGOperationInfo" : "{\"Operation\":\"Create\",\"OperationStartTimestamp\":\"2021-06-14 13:34:45.170398929 +0000 UTC\",\"OperationTimeout\":1800}", "TKGOperationLastObservedTimestamp" : "2021-06-14 13:34:45.170398929 +0000 UTC" } } } cluster control plane is still being initialized, retrying cluster control plane is still being initialized, retrying cluster control plane is still being initialized, retrying cluster control plane is still being initialized, retrying cluster control plane is still being initialized, retrying Getting secret for cluster Waiting for resource tkgm-lb-kubeconfig of type *v1.Secret to be up and running Saving management cluster kubeconfig into /home/cormac/.kube/config Installing providers on management cluster... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system" installed Component=="cluster-api" Type=="CoreProvider" Version=="v0.3.14" installed Component=="kubeadm" Type=="BootstrapProvider" Version=="v0.3.14" installed Component=="kubeadm" Type=="ControlPlaneProvider" Version=="v0.3.14" installed Component=="vsphere" Type=="InfrastructureProvider" Version=="v0.7.7" Waiting for provider infrastructure-vsphere Waiting for provider control-plane-kubeadm Waiting for provider cluster-api Waiting for provider bootstrap-kubeadm Waiting for resource capi-kubeadm-control-plane-controller-manager of type *v1.Deployment to be up and running Waiting for resource capv-controller-manager of type *v1.Deployment to be up and running Waiting for resource capi-kubeadm-bootstrap-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capi-kubeadm-control-plane-controller-manager' in namespace 'capi-kubeadm-control-plane-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying Waiting for resource capi-controller-manager of type *v1.Deployment to be up and running pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capi-kubeadm-control-plane-controller-manager' in namespace 'capi-kubeadm-control-plane-system', retrying pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capi-kubeadm-control-plane-controller-manager' in namespace 'capi-kubeadm-control-plane-system', retrying pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying Waiting for resource capi-kubeadm-control-plane-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider control-plane-kubeadm after 15.104112709s pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying pods are not yet running for deployment 'capi-kubeadm-bootstrap-controller-manager' in namespace 'capi-kubeadm-bootstrap-system', retrying pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying pods are not yet running for deployment 'capv-controller-manager' in namespace 'capv-system', retrying Waiting for resource capi-kubeadm-bootstrap-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider bootstrap-kubeadm after 25.113845112s pods are not yet running for deployment 'capi-controller-manager' in namespace 'capi-system', retrying Waiting for resource capv-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider infrastructure-vsphere after 30.11258043s Waiting for resource capi-controller-manager of type *v1.Deployment to be up and running Passed waiting on provider cluster-api after 30.152691662s Success waiting on all providers. Waiting for the management cluster to get ready for move... Waiting for resource tkgm-lb of type *v1alpha3.Cluster to be up and running Waiting for resources type *v1alpha3.MachineDeploymentList to be up and running Waiting for resources type *v1alpha3.MachineList to be up and running Waiting for addons installation... Waiting for resources type *v1alpha3.ClusterResourceSetList to be up and running Waiting for resource antrea-controller of type *v1.Deployment to be up and running Moving all Cluster API objects from bootstrap cluster to management cluster... Performing move... Discovering Cluster API objects Moving Cluster API objects Clusters=1 Creating objects in the target cluster Deleting objects from the source cluster Waiting for additional components to be up and running... Waiting for resource tanzu-addons-controller-manager of type *v1.Deployment to be up and running Waiting for resource tkr-controller-manager of type *v1.Deployment to be up and running Waiting for resource kapp-controller of type *v1.Deployment to be up and running pods are not yet running for deployment 'tanzu-addons-controller-manager' in namespace 'tkg-system', retrying pods are not yet running for deployment 'tanzu-addons-controller-manager' in namespace 'tkg-system', retrying Context set for management cluster tkgm-lb as 'tkgm-lb-admin@tkgm-lb'. Deleting kind cluster: tkg-kind-c33lkn8r994jb6dpv2p0 Management cluster created! You can now create your first workload cluster by running the following: tanzu cluster create [name] -f [file] Some addons might be getting installed! Check their status by running the following: kubectl get apps -A $ kubectl get apps -A NAMESPACE NAME DESCRIPTION SINCE-DEPLOY AGE tkg-system antrea Reconcile succeeded 2m26s 2m28s tkg-system metrics-server Reconcile succeeded 99s 2m29s tkg-system tanzu-addons-manager Reconcile succeeded 2m28s 5m23s tkg-system vsphere-cpi Reconcile succeeded 102s 2m28s tkg-system vsphere-csi Reconcile succeeded 115s 2m28s
The TKG management cluster is now up and running, and all the add-ons are present, such as Antrea for the networking and vSphere CSI to allow K8s to consume vSphere storage. We can use the tanzu login command to select this management cluster and log into it.
$ tanzu login ? Select a server tkgm-lb () ✔ successfully logged in to management cluster using the kubeconfig tkgm-lb
Note that the NSX ALB has not been called on to provide a VIP yet. The VIP for the management cluster is provided by kube-vip. We can now proceed with the deployment of a workload cluster.
TKG Workload Cluster
To deploy a workload cluster, we once again create a configuration file. Here is the one which I used in my environment. A sample is provided in $HOME/.tanzu/tkg/clusterconfigs. In this example, the workload cluster is being deployed in an air-gapped environment which has not access to the internet. Thus, all of my TKG images are in a local Harbor registry, referenced by TKG_CUSTOM_IMAGE_REPOSITORY in the configuration file.
$ cat tkgm-workload-lb.yaml #! -- See https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-tanzu-k8s-clusters-vsphere.html # #! --------------------------------------------------------------------- #! Basic cluster creation configuration #! --------------------------------------------------------------------- # CLUSTER_NAME: tkgm-workload-lb CLUSTER_PLAN: prod CNI: antrea #! --------------------------------------------------------------------- #! Node configuration #! --------------------------------------------------------------------- CONTROL_PLANE_MACHINE_COUNT: 1 WORKER_MACHINE_COUNT: 2 VSPHERE_CONTROL_PLANE_NUM_CPUS: 2 VSPHERE_CONTROL_PLANE_DISK_GIB: 40 VSPHERE_CONTROL_PLANE_MEM_MIB: 8192 VSPHERE_WORKER_NUM_CPUS: 2 VSPHERE_WORKER_DISK_GIB: 40 VSPHERE_WORKER_MEM_MIB: 4096 #! --------------------------------------------------------------------- #! vSphere configuration #! --------------------------------------------------------------------- VSPHERE_NETWORK: VL3513-PRIV-DPG VSPHERE_DATACENTER: CH-OCTO-DC VSPHERE_RESOURCE_POOL: /CH-OCTO-DC/host/CH-Cluster/Resources VSPHERE_USERNAME: "administrator@vsphere.local" VSPHERE_PASSWORD: <encoded:QWRtaW4hMjM=> VSPHERE_SERVER: 10.35.13.116 VSPHERE_DATASTORE: /CH-OCTO-DC/datastore/vsanDatastore VSPHERE_FOLDER: /CH-OCTO-DC/vm/TKG VSPHERE_INSECURE: true VSPHERE_SSH_AUTHORIZED_KEY: adfgsrhsrh VSPHERE_TLS_THUMBPRINT: 56:C1:C5:.... VSPHERE_CONTROL_PLANE_ENDPOINT: 10.35.13.151 #! --------------------------------------------------------------------- #! Common configuration #! --------------------------------------------------------------------- TKG_CUSTOM_IMAGE_REPOSITORY: "http://cormac-tkgm.corinternal.com/library" TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: true ENABLE_DEFAULT_STORAGE_CLASS: true CLUSTER_CIDR: 100.96.3.0/11 SERVICE_CIDR: 100.64.3.0/13
We can now apply this configuration file to build the workload cluster. When applied using the tanzu command, this requests the creation of a workload cluster with 1 control plane node and 2 worker nodes. Once deployed, we follow up by getting the new workload cluster context, adding the new workload cluster context to our KUBECONFIG and then switching to that workload context. I have not added any verbosity to this output. Note that there is a warning about no Pinniped configuration, meaning that there are no OIDC or LDAP Identity Providers configured. You will thus need to use admin provileges to access the workload cluster. OIDC and LDAP identity management with Pinniped and Dex is another key feature in TKGm v1.3.
$ tanzu cluster create --file ./tkgm-workload-lb.yaml Validating configuration... Warning: Pinniped configuration not found. Skipping pinniped configuration in workload cluster. Please refer to the documentation to check if you can configure pinniped on workload cluster manually Creating workload cluster 'tkgm-workload-lb'... Waiting for cluster to be initialized... Waiting for cluster nodes to be available... Waiting for addons installation... Workload cluster 'tkgm-workload-lb' created $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * tkgm-lb-admin@tkgm-lb tkgm-lb tkgm-lb-admin $ tanzu cluster list NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN tkgm-workload-lb default running 1/1 2/2 v1.20.5+vmware.1 <none> prod $ tanzu cluster kubeconfig get tkgm-workload-lb Error: failed to get pinniped-info from management cluster: failed to get pinniped-info from the cluster ... $ tanzu cluster kubeconfig get tkgm-workload-lb --admin Credentials of cluster 'tkgm-workload-lb' have been saved You can now access the cluster by running 'kubectl config use-context tkgm-workload-lb-admin@tkgm-workload-lb' $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * tkgm-lb-admin@tkgm-lb tkgm-lb tkgm-lb-admin tkgm-workload-lb-admin@tkgm-workload-lb tkgm-workload-lb tkgm-workload-lb-admin $ kubectl config use-context tkgm-workload-lb-admin@tkgm-workload-lb Switched to context "tkgm-workload-lb-admin@tkgm-workload-lb". $ kubectl get nodes NAME STATUS ROLES AGE VERSION tkgm-workload-lb-control-plane-smxtd Ready control-plane,master 4m58s v1.20.5+vmware.1 tkgm-workload-lb-md-0-7986c58d4b-g92tl Ready <none> 3m29s v1.20.5+vmware.1 tkgm-workload-lb-md-0-7986c58d4b-zb58j Ready <none> 3m31s v1.20.5+vmware.1
The workload cluster is now up and running, but once again it has not used the NSX ALB for any VIPs. The VIP (endpoint) was defined in the configuration file, and Kube-VIP was once again used to configure it as the front-end IP address for the workload cluster’s API server. Let’s now proceed with the deployment of an application which required a service of type load balancer, and then we will see NSX ALB providing this VIP.
Load Balancer application
To test out the NSX ALB, we will use an Nginx Web Server app. Here is the manifest. It is a deployment made up of 3 replicas, and an associated Load Balancer Service.
$ cat nginx-from-harbor.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 3 # tells deployment to run 3 pods matching the template template: metadata: labels: app: nginx spec: containers: - name: nginx image: cormac-tkgm.corinternal.com/library/nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: nginx name: nginx-svc spec: type: LoadBalancer ports: - name: http port: 80 targetPort: 80 protocol: TCP selector: app: nginx
When we apply this application, and we should observe a Load Balancer IP address/Virtual IP (VIP) getting allocated to the service from our NSX ALB.
$ kubectl apply -f nginx-from-harbor.yaml deployment.apps/nginx-deployment created service/nginx-svc created $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 72m nginx-svc LoadBalancer 100.70.57.45 10.35.13.192 80:30922/TCP 4m26s $ ping 10.35.13.192 PING 10.35.13.192 (10.35.13.192) 56(84) bytes of data. 64 bytes from 10.35.13.192: icmp_seq=1 ttl=64 time=0.303 ms 64 bytes from 10.35.13.192: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 10.35.13.192: icmp_seq=3 ttl=64 time=0.137 ms ^C --- 10.35.13.192 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2032ms rtt min/avg/max/mdev = 0.137/0.196/0.303/0.077 ms
We appear to have successfully received a VIP from the NSX ALB on the range which I configured on the NSX ALB. You may need to wait a short time for the service to become active before it responds to a ping request. The final test is to see if we can reach the Nginx web server default landing page on http port 80.
$ curl 10.35.13.192 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Success! Our NSX ALB is now providing VIPs for Load Balancer services in our TKG workload cluster.
Troubleshooting NSX ALB on TKGm
This integration between the NSX ALB and TKG is provided by a new AKO extension. A special AKO pod can be queried if there are issues with the VIP getting provided. Here is a logs snippet from the AKO pod on my setup. I have found these logs very useful for identifying misconfiguration issues on the NSX ALB.
$ kubectl logs ako-0 -n avi-system 2021-06-14T13:58:21.274Z INFO api/api.go:52 Setting route for GET /api/status 2021-06-14T13:58:21.274Z INFO ako-main/main.go:61 AKO is running with version: v1.3.1 2021-06-14T13:58:21.274Z INFO api/api.go:110 Starting API server at :8080 2021-06-14T13:58:21.274Z INFO ako-main/main.go:67 We are running inside kubernetes cluster. Won't use kubeconfig files. 2021-06-14T13:58:21.282Z INFO utils/ingress.go:36 networking.k8s.io/v1/IngressClass not found/enabled on cluster: ingressclasses.networking.k8s.io is forbidden: User "system:serviceaccount:avi-system:ako-sa" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope 2021-06-14T13:58:21.282Z INFO utils/utils.go:166 Initializing configmap informer in avi-system 2021-06-14T13:58:21.282Z INFO lib/cni.go:96 Skipped initializing dynamic informers 2021-06-14T13:58:21.472Z INFO utils/avi_rest_utils.go:99 Setting the client version to the current controller version 20.1.5 2021-06-14T13:58:21.492Z INFO cache/avi_ctrl_clients.go:72 Setting the client version to 20.1.5 2021-06-14T13:58:21.492Z INFO cache/avi_ctrl_clients.go:72 Setting the client version to 20.1.5 2021-06-14T13:58:21.560Z INFO cache/controller_obj_cache.go:2641 Setting cloud vType: CLOUD_VCENTER . . . 2021-06-14T15:04:10.521Z INFO rest/dequeue_nodes.go:577 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: creating/updating Pool cache, method: POST 2021-06-14T15:04:10.521Z INFO rest/avi_obj_pool.go:267 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: Added Pool cache k {admin default-tkgm-workload-lb--default-nginx-svc--80} val {default-tkgm-workload-lb--default-nginx-svc--80 admin pool-ba01367d-ac34-4262-a0b8-0ffc3caea09f 146553777 {<nil> <nil> <nil> { } 0 } { } 1623683050279030 false false} 2021-06-14T15:04:10.521Z INFO rest/dequeue_nodes.go:577 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: creating/updating L4PolicySet cache, method: POST 2021-06-14T15:04:10.521Z INFO rest/avi_obj_l4ps.go:191 Modified the VS cache for l4s object. The cache now is :{"Name":"default-tkgm-workload-lb--default-nginx-svc","Tenant":"admin","Uuid":"","Vip":"","CloudConfigCksum":"","PGKeyCollection":null,"VSVipKeyCollection":[{"Namespace":"admin","Name":"default-tkgm-workload-lb--default-nginx-svc"}],"PoolKeyCollection":[{"Namespace":"admin","Name":"default-tkgm-workload-lb--default-nginx-svc--443"},{"Namespace":"admin","Name":"default-tkgm-workload-lb--default-nginx-svc--80"}],"DSKeyCollection":null,"HTTPKeyCollection":null,"SSLKeyCertCollection":null,"L4PolicyCollection":[{"Namespace":"admin","Name":"default-tkgm-workload-lb--default-nginx-svc"}],"SNIChildCollection":null,"ParentVSRef":{"Namespace":"","Name":""},"PassthroughParentRef":{"Namespace":"","Name":""},"PassthroughChildRef":{"Namespace":"","Name":""},"ServiceMetadataObj":{"namespace_ingress_name":null,"ingress_name":"","namespace":"","hostnames":null,"namespace_svc_name":null,"crd_status":{"type":"","value":"","status":""},"pool_ratio":0,"passthrough_parent_ref":"","passthrough_child_ref":"","gateway":""},"LastModified":"","InvalidData":false,"VSCacheLock":{}} 2021-06-14T15:04:10.521Z INFO rest/avi_obj_l4ps.go:200 Added L4 Policy Set cache k {admin default-tkgm-workload-lb--default-nginx-svc} val {default-tkgm-workload-lb--default-nginx-svc admin l4policyset-72c5338c-c68e-442e-9771-d5905a48aa1f 3045110946 [default-tkgm-workload-lb--default-nginx-svc--443 default-tkgm-workload-lb--default-nginx-svc--80] 1623683050397000 false} 2021-06-14T15:04:10.521Z INFO rest/dequeue_nodes.go:577 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: creating/updating VirtualService cache, method: POST 2021-06-14T15:04:10.521Z INFO rest/avi_obj_vs.go:374 key:admin/default-tkgm-workload-lb--default-nginx-svc, msg: Service Metadata: {"namespace_ingress_name":null,"ingress_name":"","namespace":"","hostnames":null,"namespace_svc_name":["default/nginx-svc"],"crd_status":{"type":"","value":"","status":""},"pool_ratio":0,"passthrough_parent_ref":"","passthrough_child_ref":"","gateway":""} 2021-06-14T15:04:10.521Z INFO rest/avi_obj_vs.go:396 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: updated vsvip to the cache: 10.35.13.192 2021-06-14T15:04:10.521Z WARN status/svc_status.go:37 Service hostname not found for service [default/nginx-svc] status update 2021-06-14T15:04:10.536Z INFO status/svc_status.go:83 key: admin/default-tkgm-workload-lb--default-nginx-svc, msg: Successfully updated the status of serviceLB: default/nginx-svc old: [] new [{IP:10.35.13.192 Hostname:}] . . .
And of course, if all has gone as expected, one should be able to observe the Service Engines being deployed in the vSphere UI, and on logging into the NSX ALB management portal, we should see the Virtual Service on appear. Here is a view taken from the Applications > Dashboard:
This is the Applications > Virtual Services view.
A really nice view is to return to Dashboard, and instead of View VS List, do a View VS Tree and click on the + sign to expand it. This shows the VS, Pool, the network and the IP addresses of the TKG nodes that are associated with the service. In this case, since it our Nginx application has 3 replicas, we see 3 different nodes displayed.
Our TKG workload cluster is now integrated with the NSX Advanced Load Balancer and it is successfully providing Virtual IP addresses (VIPs) for applications in my TKG cluster that request a Load Balancer service.