Deploying Tanzu Kubernetes “guest” cluster in vSphere with Tanzu
In this final installment of my “vSphere with Tanzu” posts, we are going to look at how to create our very first Tanzu Kubernetes (TKG) guest cluster. In previous posts, we have compared vSphere with Tanzu to VCF with Tanzu, and covered the prerequisites. Then we looked at the steps involved in deploying the HA-Proxy to provide a load balancer service to vSphere with Tanzu. In my most recent post, we looked at the steps involved in enabling workload management. Now that all of that is in place, we are finally able to go ahead and deploy a TKG cluster, providing a VMware engineered, VMware supported, conformant, Kubernetes cluster.
Note: This procedure is using a non-release version of the product. Thus, some of the screenshots may change before GA. However the deployment steps should remain the same.
Step 1 – Create a Namespace
vSphere with Tanzu continues to support the concept of a namespace. It enables a vSphere administrator to control the resources that are available for a developer or team of developers when they are working on a vSphere with Tanzu deployment. This avoids developers “running wild” and consuming more than their fair share of underlying infrastructure resources and impacting other developers or teams of developers working on the same infrastructure, or indeed, impacting production. To get started, in Workload Management, select Namespaces and then click on “Create Namespace“, as shown below.
When creating a namespace, you will need to select the cluster on which the namespace is being created. Since I only have a single cluster, that is quite straight-forward. You will need to provide the name for the namespace – in my case, I called it cormac-ns. Lastly, and this is something new, you will need to select a workload network. If you remember the previous post on enabling workload management, we had the ability to create multiple workload networks. I only created one at the time, so again this step is easy. Once the optional description has been added, click on the “Create” button.
All going well, the namespace will get created, similar to what is shown below:
Note that the Tanzu Kubernetes window shows that the namespace already has a Content Library associated with it. This was done when we included a Content Library during the deployment of vSphere with Tanzu / Workload Management previously. I am not doing anything with Permissions – I will use my SSO administrator login later, which implicitly has full permissions on the namespace already. I am not going to do anything with Capacity and Usage settings – you would certainly want to review and perhaps tune these settings in a production environment. The only step I need to add is the addition of a Storage Policy to the Storage section. I am going to add the “vSAN Default Storage Policy”. Note that any Storage Policy that is added to the Namespace appears as a Kubernetes Storage Class which can then be used when provisioning Persistent Volumes. We will see this Storage Class later on when we login to vSphere with Tanzu and build the TKG “guest” cluster.
Once the storage class is selected, it will be visible in the Storage section of the namespace landing page in the vSphere UI.
Everything is now in place to deploy the TKG cluster.
Step 2 – Login to vSphere with Tanzu and deploy a TKG
When we initially deployed vSphere with Tanzu, I showed you the Kubernetes CLI Tools landing page. This is accessible by connecting to the Load Balancer IP address of the Supervisor Control Plane Kubernetes API server. In fact, there is also a link to this URL in the Status window of the Namespace as well. From this page, we can download various tools such as kubectl and kubectl-login to access a namespace and deploy a TKG.
I will use these tools to login to the cormac-ns namespace created earlier, and deploy out a TKG – a Tanzu Kubernetes cluster.
2.1 Login to the namespace context
To login, use the kubectl-login command and set the server to the Supervisor Control Plane API Server IP address.
C:\bin>kubectl-vsphere.exe login --insecure-skip-tls-verify --vsphere-username \ administrator@vsphere.local --server=https://192.50.0.176 Password:******** Logged in successfully. You have access to the following contexts: 192.50.0.176 cormac-ns If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator. To change context, use `kubectl config use-context <workload name>` C:\bin>kubectl config use-context cormac-ns Switched to context "cormac-ns".
2.2 Verify Control Plane and Storage Class
At this point, I always like to check that the 3 control plane nodes are in a ready state, and that the storage policy that we assigned to the namespace earlier has indeed appeared as a storage class. One other item that is useful to verify is that the TKG virtual machine images are visible and that the content library used for storing the images has indeed synchronized successfully. It looks like everything is present and correct.
C:\bin>kubectl get nodes NAME STATUS ROLES AGE VERSION 422425ad85759a5db789ec2120747d13 Ready master 18m v1.18.2-6+38ac483e736488 42242c84f294f9c38ecfe419ca601c6e Ready master 19m v1.18.2-6+38ac483e736488 42249f93e32ea81ca8f03efa2465d67f Ready master 28m v1.18.2-6+38ac483e736488 C:>kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE vsan-default-storage-policy csi.vsphere.vmware.com Delete Immediate true 57s C:\bin>kubectl get virtualmachineimages NAME VERSION OSTYPE ob-15957779-photon-3-k8s-v1.16.8---vmware.1-tkg.3.60d2ffd v1.16.8+vmware.1-tkg.3.60d2ffd vmwarePhoton64Guest ob-16466772-photon-3-k8s-v1.17.7---vmware.1-tkg.1.154236c v1.17.7+vmware.1-tkg.1.154236c vmwarePhoton64Guest ob-16545581-photon-3-k8s-v1.16.12---vmware.1-tkg.1.da7afe7 v1.16.12+vmware.1-tkg.1.da7afe7 vmwarePhoton64Guest ob-16551547-photon-3-k8s-v1.17.8---vmware.1-tkg.1.5417466 v1.17.8+vmware.1-tkg.1.5417466 vmwarePhoton64Guest
2.3 Create a manifest file for the TKG deployment
Creating TKG “guest” clusters in vSphere with Tanzu is really simple. All one needs is a simple manifest file in YAML detailing information about the name of the cluster, the number of control plane nodes, the number of worker nodes, the size of the nodes from a resource perspective (class), which storage class to use (storageClass) and which image to use for the nodes (version). Here is an example that I used, specifying a single control plane node, 2 x worker nodes, and to use image version 1.17.7. This use of version number is a shorthand way of specifying which Photon OS image to use from the content library listing previously. Note that only v1.17.x images contain the Antrea CNI.
C:\bin>type cluster.yaml apiVersion: run.tanzu.vmware.com/v1alpha1 kind: TanzuKubernetesCluster metadata: name: tkg-cluster-01 spec: topology: controlPlane: count: 1 class: guaranteed-small storageClass: vsan-default-storage-policy workers: count: 2 class: guaranteed-small storageClass: vsan-default-storage-policy distribution: version: v1.17.7
Note the indentation. It needs to be just right for the manifest to work. If you are interested in learning more about the resources assigned to the various classes, you can use the following commands to query it. Note the ‘Spec‘ details at the bottom of the describe output.
C:\bin>kubectl get virtualmachineclass NAME AGE best-effort-2xlarge 2d16h best-effort-4xlarge 2d16h best-effort-8xlarge 2d16h best-effort-large 2d16h best-effort-medium 2d16h best-effort-small 2d16h best-effort-xlarge 2d16h best-effort-xsmall 2d16h guaranteed-2xlarge 2d16h guaranteed-4xlarge 2d16h guaranteed-8xlarge 2d16h guaranteed-large 2d16h guaranteed-medium 2d16h guaranteed-small 2d16h guaranteed-xlarge 2d16h guaranteed-xsmall 2d16h C:\bin>kubectl describe virtualmachineclass guaranteed-small Name: guaranteed-small Namespace: Labels: <none> Annotations: API Version: vmoperator.vmware.com/v1alpha1 Kind: VirtualMachineClass Metadata: Creation Timestamp: 2020-09-25T15:25:56Z Generation: 1 Managed Fields: API Version: vmoperator.vmware.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:hardware: .: f:cpus: f:memory: f:policies: .: f:resources: .: f:requests: .: f:cpu: f:memory: Manager: kubectl Operation: Update Time: 2020-09-25T15:25:56Z Resource Version: 2939 Self Link: /apis/vmoperator.vmware.com/v1alpha1/virtualmachineclasses/guaranteed-small UID: 4a547922-0ecf-4899-a8b8-12cc6dbd78e8 Spec: Hardware: Cpus: 2 Memory: 4Gi Policies: Resources: Requests: Cpu: 2000m Memory: 4Gi Events: <none>
Note: If you use a home lab with a small or even nested environment, it might be better to use a best-effort-small rather than a guaranteed-small for the guest cluster, as it won’t require as many resources. Simply edit the cluster.yaml appropriately and make the change for the worker and controlPlane class entries.
Having the “Requests” values for CPU and Memory match the “Spec.Hardware” entries means that these resources are guaranteed, rather than a best efforts. Of course, as we have seen earlier, best efforts is also available as a virtual machine class.
2.4 Apply the TKG manifest, and monitor the deployment
At this point, we can go ahead and deploy the TKG cluster by applying the manifest shown earlier. We can then use a variety of commands to monitor the deployment. The describe commands can be very long, so I will only show the output from when the cluster has been deployed, but you can obviously use the describe command repeatedly to monitor the TKG cluster deployment status.
C:\bin>kubectl apply -f cluster.yaml tanzukubernetescluster.run.tanzu.vmware.com/tkg-cluster-01 created C:\bin>kubectl get cluster NAME PHASE tkg-cluster-01 Provisioned C:\bin>kubectl get tanzukubernetescluster NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE tkg-cluster-01 1 2 v1.17.7+vmware.1-tkg.1.154236c 3m7s creating
Here is the complete output from a describe command. Lots of useful information, such as the use of Antrea as the CNI, node and VM status, Cluster API endpoint (from our load balancer, frontend network range of IP addresses)
C:\bin>kubectl describe tanzukubernetescluster Name: tkg-cluster-01 Namespace: cormac-ns Labels: <none> Annotations: API Version: run.tanzu.vmware.com/v1alpha1 Kind: TanzuKubernetesCluster Metadata: Creation Timestamp: 2020-09-23T15:41:25Z Finalizers: tanzukubernetescluster.run.tanzu.vmware.com Generation: 1 Managed Fields: API Version: run.tanzu.vmware.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:distribution: .: f:version: f:topology: .: f:controlPlane: .: f:class: f:count: f:storageClass: f:workers: .: f:class: f:count: f:storageClass: Manager: kubectl Operation: Update Time: 2020-09-23T15:41:25Z API Version: run.tanzu.vmware.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"tanzukubernetescluster.run.tanzu.vmware.com": f:status: .: f:addons: .: f:authsvc: .: f:name: f:status: f:version: f:cloudprovider: .: f:name: f:status: f:version: f:cni: .: f:name: f:status: f:version: f:csi: .: f:name: f:status: f:version: f:dns: .: f:name: f:status: f:version: f:proxy: .: f:name: f:status: f:version: f:psp: .: f:name: f:status: f:version: f:clusterApiStatus: .: f:apiEndpoints: f:phase: f:nodeStatus: .: f:tkg-cluster-01-control-plane-sn85m: f:tkg-cluster-01-workers-csjd7-554668c497-vtf68: f:tkg-cluster-01-workers-csjd7-554668c497-z8vjt: f:phase: f:vmStatus: .: f:tkg-cluster-01-control-plane-sn85m: f:tkg-cluster-01-workers-csjd7-554668c497-vtf68: f:tkg-cluster-01-workers-csjd7-554668c497-z8vjt: Manager: manager Operation: Update Time: 2020-09-23T15:52:45Z Resource Version: 22469 Self Link: /apis/run.tanzu.vmware.com/v1alpha1/namespaces/cormac-ns/tanzukubernetesclusters/tkg-cluster-01 UID: 9ede742d-c7e3-4715-ac7e-89d2ed312a16 Spec: Distribution: Full Version: v1.17.7+vmware.1-tkg.1.154236c Version: v1.17.7 Settings: Network: Cni: Name: antrea Pods: Cidr Blocks: 192.168.0.0/16 Service Domain: cluster.local Services: Cidr Blocks: 10.96.0.0/12 Topology: Control Plane: Class: guaranteed-small Count: 1 Storage Class: vsan-default-storage-policy Workers: Class: guaranteed-small Count: 2 Storage Class: vsan-default-storage-policy Status: Addons: Authsvc: Name: authsvc Status: applied Version: 0.1-65-ge3d8be8 Cloudprovider: Name: vmware-guest-cluster Status: applied Version: 0.1-77-g5875817 Cni: Name: antrea Status: applied Version: v0.7.2_vmware.1 Csi: Name: pvcsi Status: applied Version: v0.0.1.alpha+vmware.73-4a26ce0 Dns: Name: CoreDNS Status: applied Version: v1.6.5_vmware.5 Proxy: Name: kube-proxy Status: applied Version: 1.17.7+vmware.1 Psp: Name: defaultpsp Status: applied Version: v1.17.7+vmware.1-tkg.1.154236c Cluster API Status: API Endpoints: Host: 192.50.0.177 Port: 6443 Phase: Provisioned Node Status: tkg-cluster-01-control-plane-sn85m: ready tkg-cluster-01-workers-csjd7-554668c497-vtf68: ready tkg-cluster-01-workers-csjd7-554668c497-z8vjt: ready Phase: running Vm Status: tkg-cluster-01-control-plane-sn85m: ready tkg-cluster-01-workers-csjd7-554668c497-vtf68: ready tkg-cluster-01-workers-csjd7-554668c497-z8vjt: ready Events: <none> C:\bin>kubectl get tanzukubernetescluster NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE tkg-cluster-01 1 2 v1.17.7+vmware.1-tkg.1.154236c 11m running C:\bin>kubectl get cluster NAME PHASE tkg-cluster-01 Provisioned
Of interest is the API Endpoints, which is another IP address from the range of Load Balancer IP addresses. From a networking perspective, our deployment now looks something similar to the following:
One last thing to notice is that if we switch back to the vSphere Client UI, and examine the namespace we can now see that there is an update in the Tanzu Kubernetes window showing one cluster deployed. And if you look to the left at the inventory, you can also see the TKG cluster as an inventory item in vSphere:
2.5 Logout, then login to TKG cluster context
All of the above was carried out in the context of a namespace in the vSphere with Tanzu Supervisor cluster. The next step is to logout from the Supervisor context, and login to the TKG guest cluster context. This allows us to direct kubectl commands at the TKG cluster API server, rather than the Kubernetes API server in the Supervisor cluster. There are other ways to achieve this through setting a KUBECONFIG environment variable, but I find it easier to simply logout and login again.
C:\bin>kubectl-vsphere.exe logout Your KUBECONFIG context has changed. The current KUBECONFIG context is unset. To change context, use `kubectl config use-context <workload name>` Logged out of all vSphere namespaces. C:\bin>kubectl-vsphere.exe login \ --insecure-skip-tls-verify \ --vsphere-username administrator@vsphere.local \ --server=https://192.50.0.176 \ --tanzu-kubernetes-cluster-namespace cormac-ns \ --tanzu-kubernetes-cluster-name tkg-cluster-01 Password: ******** Logged in successfully. You have access to the following contexts: 192.50.0.176 cormac-ns tkg-cluster-01 If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator. To change context, use `kubectl config use-context <workload name>`
2.6 Validate TKG cluster context
C:\bin>kubectl get nodes NAME STATUS ROLES AGE VERSION tkg-cluster-01-control-plane-sn85m Ready master 8m40s v1.17.7+vmware.1 tkg-cluster-01-workers-csjd7-554668c497-vtf68 Ready <none> 2m16s v1.17.7+vmware.1 tkg-cluster-01-workers-csjd7-554668c497-z8vjt Ready <none> 2m16s v1.17.7+vmware.1 C:\bin>kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system antrea-agent-7zphw 2/2 Running 0 2m23s kube-system antrea-agent-mvczg 2/2 Running 0 2m23s kube-system antrea-agent-t6qgc 2/2 Running 0 8m11s kube-system antrea-controller-76c76c7b7c-4cwtm 1/1 Running 0 8m11s kube-system coredns-6c78df586f-6d77q 1/1 Running 0 7m55s kube-system coredns-6c78df586f-c8rtj 1/1 Running 0 8m14s kube-system etcd-tkg-cluster-01-control-plane-sn85m 1/1 Running 0 7m24s kube-system kube-apiserver-tkg-cluster-01-control-plane-sn85m 1/1 Running 0 7m12s kube-system kube-controller-manager-tkg-cluster-01-control-plane-sn85m 1/1 Running 0 7m30s kube-system kube-proxy-frpgn 1/1 Running 0 2m23s kube-system kube-proxy-lchkq 1/1 Running 0 2m23s kube-system kube-proxy-m2xjn 1/1 Running 0 8m14s kube-system kube-scheduler-tkg-cluster-01-control-plane-sn85m 1/1 Running 0 7m27s vmware-system-auth guest-cluster-auth-svc-lf2nf 1/1 Running 0 8m2s vmware-system-cloud-provider guest-cluster-cloud-provider-7788f74548-cfng4 1/1 Running 0 8m13s vmware-system-csi vsphere-csi-controller-574cfd4569-kfdz8 6/6 Running 0 8m12s vmware-system-csi vsphere-csi-node-hvzpg 3/3 Running 0 8m12s vmware-system-csi vsphere-csi-node-t4w8p 3/3 Running 0 2m23s vmware-system-csi vsphere-csi-node-t6rdw 3/3 Running 0 2m23s
Looks good. We have successfully deploy a TKG guest cluster in vSphere with Tanzu.
Summary
Over the last number of blog posts we have seen the following:
- How vSphere with Tanzu is different to VCF with Tanzu, including network requirements. We also looked at the various prerequisites required to successfully deploy vSphere with Tanzu.
- We looked at how to deploy the HA-Proxy, and how it provides load balancer services for vSphere with Tanzu.
- We saw how to enable workload management, and stand up vSphere with Tanzu.
- And finally, in this post, we saw how to deploy a TKG ‘guest’ cluster in vSphere with Tanzu.
Hopefully this has given you enough information to go and review vSphere with Tanzu in your own vSphere environment. I am always interested in hearing your feedback – what works, what doesn’t work, etc. Feel free to leave a comment or reach out to me on social media.
Happy Tanzu’ing.
Very good tutorial. Kudoz to you Cormac.
It makes me clear about the new feature of Kubernetes with Tanzu. Can you continue a bit about giving us an information about how to deploy a container directly in VMPods on the Supervisor cluster. A little bit of explanation about the difference between deploying workload on the conformance and non conformance Kubernetes cluster.
Hi Adrie – the process is exactly the same for Kubernetes Pods and PodVMs. You simply need to ensure that the context is set correctly. For PodVMs, the context is a namespace; for Kubernetes Pods, the context is the namespace plus the Guest cluster. There are a number of examples are ready on this blog site.
Hey Cormac… I’m getting what looks like a HA Admission Control issue trying to deploy TKG.
The host does not have sufficient CPU resources to satisfy the reservation
I’ve tried a few combinations… the hosts are nested but have 2vCPU and 24GB of memory each. (This is a HA-Proxy deployment as well)
Further from that, I can’t stop the deployment attempt… it’s stuck provisioning. How do I stop/kill the creations
Any tips on both?
Not sure what the admission control issue is Anthony – I’ve not tried a deployment with nested ESXi, only physical.
If things are not responding, a last resort you could use is to reset the service through vCenter Server. Use the following command:
root@vcsa-06 [ ~ ]# cd /usr/lib/vmware-wcp
root@vcsa-06 [ /usr/lib/vmware-wcp ]# vmon-cli -r wcp
HI Cormac
i was trying to connect to Supervisor control plane with the command “kubectl-vsphere.exe login –insecure-skip-tls-verify –vsphere-username \
administrator@vsphere.local –server=https://192.50.0.176″ and got the prompt for password, the i got a generic error message “time=”2020-10-21T10:48:03Z” level=error msg=”internal server error”
i dont see this in the wcpsvc log, what log should i be looking at to analyze this error?
BR
Ingvar
Not sure Ingvar – is this definitely the IP address of the control plane API server Load Balancer (provided by the HA-Proxy)?
Has the Workload Management configured correct? Have you been able to create a Namespace? Can you click on the Kubernetes Tools IP Address and reach the Tools download page?
yes i did follow your guide and succesfully creted a namespace, i can access the webpage and see the tools download just as in you example, i surely could have done something wrong 🙂 but would like to get the correct log to look what have gone wrong.
Were you able to fix this Ingvar? I have the same problem/genreric error message. I can access the tools just fine.
HI Cormac,
Your article helpful for setup my lab. But i have some question.
From screenshot below link, After I tried deploy guestbook application on deployments menu at compute tab on namespce not show anything. Is this correct?
https://ibb.co/hMjtv7M
Yes – that is expected. We only show PodVMs in the vSphere UI (those deployed in the Supervisor Cluster). Native Kubernetes Pods deployed in a TKG Guest Cluster do not appear in the UI. We only show guest cluster and node information.
However, if you select the vSphere Cluster Object in the vSphere inventory, then select the Monitor tab and scroll down to Container Volumes, you should be able to see any PVs created on behalf of the guestbook application.
HI Cormac,
Thank you very much.
Hi Cormac, I would change the cluster.yaml example to use best-effort so small labs can power on the VMs, I found out using guaranteed was my problem in my small lab environment. Also my first try failed because Lifecycle manager, I guess that will be changed as many customers would be happy to use both Tanzu and Lifecycle Manager.
Thanks for the great post, it really helped me out to test Tanzu!!
Great feedback – thanks Raul. I will add that note about the “best-effort” to the post.
Hi Cormac, i deployed an TKG cluster with similar cluster.yaml as you have in your article, it just deployed an control plane VM and now i get an error as shown below and the deployment runs without creating the workers.
Authsvc:
Last Error Message: unable to create or update authsvc: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-auth?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Name:
Status: error
Cloudprovider:
Name: vmware-guest-cluster
Status: applied
Version: 0.1-77-g5875817
Cni:
Last Error Message: failed to update owner reference for antrea secret in supervisor cluster: Secret “tkg-cluster-02-antrea” not found
Name:
Status: error
Csi:
Last Error Message: Post https://192.168.15.178:6443/apis/rbac.authorization.k8s.io/v1/namespaces/vmware-system-csi/roles?timeout=10s: context deadline exceeded
Name:
Status: error
Dns:
Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
Name:
Status: error
Proxy:
Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
Name:
Status: error
BR
Ingvar
Hi Ingvar,
I *think* I may have had this issue when I deployed a 3 NIC HA-Proxy, but I had no route between the FrontEnd network and the Load Balancer network. If you do a ‘kubectl describe tanzukubernetescluster’, can you see all of the AddOns populated? Or are they errors against some of them as well?
Can you successfully route between the FrontEnd and LB networks?
i can ping all ip addresses, the LB network is on the same subnet as Frontend so i did not create any special routes there, maybe that is needed as LB does have different mask.
the addons are giving me lot of errors, only antrea and defaultpsp seems to be applied
the errors above are from the addons part of kubectl describe tanzukubernetescluster command.
i dont have access to 3 different networks to test it properly, should i setup a new HA proxy with 2 nics to test if that would simplify deployment?
i am just setting it up now to see basic functions and learning about Kubernetes on Vmware.
Status:
Addons:
Authsvc:
Last Error Message: unable to create or update authsvc: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-auth?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Name: authsvc
Status: error
Version: 0.1-65-ge3d8be8
Cloudprovider:
Last Error Message: Put https://192.168.15.178:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/cloud-provider-patch-cluster-role?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Name: vmware-guest-cluster
Status: error
Version: 0.1-77-g5875817
Cni:
Name: antrea
Status: applied
Version: v0.7.2_vmware.1
Csi:
Last Error Message: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-csi?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Name: pvcsi
Status: error
Version: v0.0.1.alpha+vmware.73-4a26ce0
Dns:
Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
Name:
Status: error
Proxy:
Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
Name:
Status: error
Psp:
Name: defaultpsp
Status: applied
Version: v1.17.7+vmware.1-tkg.1.154236c
Cluster API Status:
API Endpoints:
Host: 192.168.15.178
Port: 6443
Phase: Provisioned
Node Status:
tkg-cluster-02-control-plane-ncbj7: pending
tkg-cluster-02-workers-ckjct-6f97fd7569-lbhwc: pending
tkg-cluster-02-workers-ckjct-6f97fd7569-mv2vb: pending
Phase: creating
Vm Status:
tkg-cluster-02-control-plane-ncbj7: ready
tkg-cluster-02-workers-ckjct-6f97fd7569-lbhwc: pending
tkg-cluster-02-workers-ckjct-6f97fd7569-mv2vb: pending
Events:
Same error. Tried reinstall maybe 10 times now, also tried frontend/workload on different routable network, same errors. I validated from within the control-plane VM (ssh using secret pass), and I can see ping and TCP working fine in any destination.
To simplify things, try a single frontend/workload network rather than separate ones. Just make sure that the CIDR matches an actual range, and that there is no overlap of IP address.
So I think this may be subnet masks, or CIDR settings not mapping to ranges correctly Ingvar. I have seen this when there is no communication path between the LB assigned to the control plane and the actual IP addresses on the workload network. Use a CIDR calculator like https://www.ipaddressguide.com/cidr to make sure that you have correctly provided proper ranges that start and stop at boundaries that do not overlap anything else, e,g. gatewway, HA-Proxy interface, other range. You should be able to get it to work with just 2 ranges, just make sure the IP address ranges, CIDRs, subnet masks, are all good and you should be ok. Also make sure there is nothing like a DHCP server also provisioning IP addresses in those ranges. Another useful tool to have is an IP Address Scanner (I use the angry one – https://angryip.org/ – to make sure that the Proxy correctly plumbs up the Load Balancer range as expected (these become pingable once the proxy is configured). Good luck!
I’ve just come across this issue… almost 100% sure i’ve got the CIDRs right. (this is in my second deployment location… first one worked ok)
It deploys the control node but I’m seeing this in the describe tkg cluster
Dns:
Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
Name:
Status: error
Proxy:
Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
Name:
Status: error
Also seems like it popped up here: https://www.reddit.com/r/vmware/comments/jp97lb/fail_to_install_a_guestcluster_on_70u1_vsphere/gbtonhz/?utm_source=reddit&utm_medium=web2x&context=3
Things to check Anthony. Some you mentioned already:
1. When doing a 3 x NIC deployment with separate frontend and workload networks, ensure there is a route between them. This caught me out, and I got the above error is what I observed. If it a 2 NIC deployment where the frontend and backend are on the same segment, then it is something in the configuration that is causing it.
2. If you only have access to a partial subnet range, e.g. /16 or /8, for either the load balancer range and the workload range, make sure that you have the subnet mask set accordingly in all places. For example if the segment you are using for both the workload and the frontend is only made up of only the first 64 IP addresses of a range e.g. 10.0.0.0/26, then specify the /26 subnet when defining the HA Proxy Workload IP address as well as the workload network (step 7) in vSphere UI should be 255.255.255.192. Everything needs to match.
3. When setting up the ranges, make sure that you check it against a CIDR calculator
4. Make sure there is no DHCP server also offering up IP address on the subnets / ranges that you choose.
HTH
Hi Cormac, again thank you for all your posts regarding tanzu. I have a (may stupid) question: How do I shutdown safely a tanzu cluster? In an (home) lab sometimes I have to shutdown everything… Thanks a lot! Paul
Hi Cormac,
I tried a lot to deploy guest cluster but failed. Always stuck in creating phase. Control plane is deployed but pending in worker node . Can you please help me out. My mail id is fahimistiaq91@gmail.com or whatsapp +8801847133058
I’m not in a position to do that. Please use GSS to open a support issue, or alternatively use the VMware Communities.
Hi
I have the same problem as @Fahim, stuck in creating phase. Control VM is deployed, Workers do not.
Control VM has IP, but it’s not pingable from temp VM in same dvs and subnet.
35m Warning ReconcileFailure wcpmachine/simple-cluster-control-plane-qlvmw-wnfmd vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-namespace-01/simple-cluster/simple-cluster-control-plane-qlvmw-wnfmd
34m Warning ReconcileFailure wcpmachine/simple-cluster-control-plane-qlvmw-wnfmd vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-namespace-01/simple-cluster/simple-cluster-control-plane-qlvmw-wnfmd
36m Normal CreateVMServiceSuccess virtualmachineservice/simple-cluster-control-plane-service CreateVMService success
81s Normal Reconcile gateway/simple-cluster-control-plane-service Success
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-nxknp”
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-wlcdc”
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-qq7hw”
36m Normal SuccessfulCreate machinedeployment/simple-cluster-workers-9h754 Created MachineSet “simple-cluster-workers-9h754-75f8d97dd8”
36m Warning ReconcileFailure wcpcluster/simple-cluster unexpected error while reconciling control plane endpoint for simple-cluster: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-namespace-01/simple-cluster: failed to get control plane endpoint for Cluster tanzu-namespace-01/simple-cluster: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
Any ideas?
You will need to go back and recheck the networking. Sounds like something is amiss with the CIDR range/subnet masks that were used for the frontend network or the workload network. Ensure that from a temp VM on the workload network that you are able to each the haproxy URL on port 5556, and that you are able to ping the IP addresses on the frontend/load balancer network before deploying workload management.
Hi Cormac,
Thanks for this wonderfull blog post !!
I have deployed vSphere with Tanzu , utilising vSphere networking . Everything is up and running fine as expected. Tested with deployment of nginx application in Guest cluster . Need your inputs to understand from where do kubernetes fetched registry-images for deployment of application . And how can i connect this vsphere with tanzu cluster , to gitlab so, that i can import images to deploy applications . Unfortunately don’t have NSX-T in place , because of which can’t use embedded harbor registry repository feature .
Please help me with the query ……