Deploying Tanzu Kubernetes “guest” cluster in vSphere with Tanzu

by CormacSeptember 29, 2020October 22, 2020

In this final installment of my “vSphere with Tanzu” posts, we are going to look at how to create our very first Tanzu Kubernetes (TKG) guest cluster. In previous posts, we have compared vSphere with Tanzu to VCF with Tanzu, and covered the prerequisites. Then we looked at the steps involved in deploying the HA-Proxy to provide a load balancer service to vSphere with Tanzu. In my most recent post, we looked at the steps involved in enabling workload management. Now that all of that is in place, we are finally able to go ahead and deploy a TKG cluster, providing a VMware engineered, VMware supported, conformant, Kubernetes cluster.

Note: This procedure is using a non-release version of the product. Thus, some of the screenshots may change before GA. However the deployment steps should remain the same.

Step 1 – Create a Namespace

vSphere with Tanzu continues to support the concept of a namespace. It enables a vSphere administrator to control the resources that are available for a developer or team of developers when they are working on a vSphere with Tanzu deployment. This avoids developers “running wild” and consuming more than their fair share of underlying infrastructure resources and impacting other developers or teams of developers working on the same infrastructure, or indeed, impacting production. To get started, in Workload Management, select Namespaces and then click on “Create Namespace“, as shown below.

When creating a namespace, you will need to select the cluster on which the namespace is being created. Since I only have a single cluster, that is quite straight-forward. You will need to provide the name for the namespace – in my case, I called it cormac-ns. Lastly, and this is something new, you will need to select a workload network. If you remember the previous post on enabling workload management, we had the ability to create multiple workload networks. I only created one at the time, so again this step is easy. Once the optional description has been added, click on the “Create” button.

All going well, the namespace will get created, similar to what is shown below:

Note that the Tanzu Kubernetes window shows that the namespace already has a Content Library associated with it. This was done when we included a Content Library during the deployment of vSphere with Tanzu / Workload Management previously. I am not doing anything with Permissions – I will use my SSO administrator login later, which implicitly has full permissions on the namespace already. I am not going to do anything with Capacity and Usage settings – you would certainly want to review and perhaps tune these settings in a production environment. The only step I need to add is the addition of a Storage Policy to the Storage section. I am going to add the “vSAN Default Storage Policy”. Note that any Storage Policy that is added to the Namespace appears as a Kubernetes Storage Class which can then be used when provisioning Persistent Volumes. We will see this Storage Class later on when we login to vSphere with Tanzu and build the TKG “guest” cluster.

Once the storage class is selected, it will be visible in the Storage section of the namespace landing page in the vSphere UI.

Everything is now in place to deploy the TKG cluster.

Step 2 – Login to vSphere with Tanzu and deploy a TKG

When we initially deployed vSphere with Tanzu, I showed you the Kubernetes CLI Tools landing page. This is accessible by connecting to the Load Balancer IP address of the Supervisor Control Plane Kubernetes API server. In fact, there is also a link to this URL in the Status window of the Namespace as well. From this page, we can download various tools such as kubectl and kubectl-login to access a namespace and deploy a TKG.

I will use these tools to login to the cormac-ns namespace created earlier, and deploy out a TKG – a Tanzu Kubernetes cluster.

2.1 Login to the namespace context

To login, use the kubectl-login command and set the server to the Supervisor Control Plane API Server IP address.

C:\bin>kubectl-vsphere.exe login --insecure-skip-tls-verify --vsphere-username \
administrator@vsphere.local --server=https://192.50.0.176
Password:********
Logged in successfully.

You have access to the following contexts:
   192.50.0.176
   cormac-ns

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.


To change context, use `kubectl config use-context <workload name>`

C:\bin>kubectl config use-context cormac-ns
Switched to context "cormac-ns".

2.2 Verify Control Plane and Storage Class

At this point, I always like to check that the 3 control plane nodes are in a ready state, and that the storage policy that we assigned to the namespace earlier has indeed appeared as a storage class. One other item that is useful to verify is that the TKG virtual machine images are visible and that the content library used for storing the images has indeed synchronized successfully. It looks like everything is present and correct.

C:\bin>kubectl get nodes
NAME                               STATUS   ROLES    AGE   VERSION
422425ad85759a5db789ec2120747d13   Ready    master   18m   v1.18.2-6+38ac483e736488
42242c84f294f9c38ecfe419ca601c6e   Ready    master   19m   v1.18.2-6+38ac483e736488
42249f93e32ea81ca8f03efa2465d67f   Ready    master   28m   v1.18.2-6+38ac483e736488


C:>kubectl get sc
NAME                          PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
vsan-default-storage-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   57s


C:\bin>kubectl get virtualmachineimages
NAME                                                         VERSION                           OSTYPE
ob-15957779-photon-3-k8s-v1.16.8---vmware.1-tkg.3.60d2ffd    v1.16.8+vmware.1-tkg.3.60d2ffd    vmwarePhoton64Guest
ob-16466772-photon-3-k8s-v1.17.7---vmware.1-tkg.1.154236c    v1.17.7+vmware.1-tkg.1.154236c    vmwarePhoton64Guest
ob-16545581-photon-3-k8s-v1.16.12---vmware.1-tkg.1.da7afe7   v1.16.12+vmware.1-tkg.1.da7afe7   vmwarePhoton64Guest
ob-16551547-photon-3-k8s-v1.17.8---vmware.1-tkg.1.5417466    v1.17.8+vmware.1-tkg.1.5417466    vmwarePhoton64Guest

2.3 Create a manifest file for the TKG deployment

Creating TKG “guest” clusters in vSphere with Tanzu is really simple. All one needs is a simple manifest file in YAML detailing information about the name of the cluster, the number of control plane nodes, the number of worker nodes, the size of the nodes from a resource perspective (class), which storage class to use (storageClass) and which image to use for the nodes (version). Here is an example that I used, specifying a single control plane node, 2 x worker nodes, and to use image version 1.17.7. This use of version number is a shorthand way of specifying which Photon OS image to use from the content library listing previously. Note that only v1.17.x images contain the Antrea CNI.

C:\bin>type cluster.yaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: tkg-cluster-01
spec:
  topology:
    controlPlane:
      count: 1
      class: guaranteed-small
      storageClass: vsan-default-storage-policy
    workers:
      count: 2
      class: guaranteed-small
      storageClass: vsan-default-storage-policy
  distribution:
    version: v1.17.7

Note the indentation. It needs to be just right for the manifest to work. If you are interested in learning more about the resources assigned to the various classes, you can use the following commands to query it. Note the ‘Spec‘ details at the bottom of the describe output.

C:\bin>kubectl get virtualmachineclass
NAME                  AGE
best-effort-2xlarge   2d16h
best-effort-4xlarge   2d16h
best-effort-8xlarge   2d16h
best-effort-large     2d16h
best-effort-medium    2d16h
best-effort-small     2d16h
best-effort-xlarge    2d16h
best-effort-xsmall    2d16h
guaranteed-2xlarge    2d16h
guaranteed-4xlarge    2d16h
guaranteed-8xlarge    2d16h
guaranteed-large      2d16h
guaranteed-medium     2d16h
guaranteed-small      2d16h
guaranteed-xlarge     2d16h
guaranteed-xsmall     2d16h


C:\bin>kubectl describe virtualmachineclass guaranteed-small
Name:         guaranteed-small
Namespace:
Labels:       <none>
Annotations:  API Version:  vmoperator.vmware.com/v1alpha1
Kind:         VirtualMachineClass
Metadata:
  Creation Timestamp:  2020-09-25T15:25:56Z
  Generation:          1
  Managed Fields:
    API Version:  vmoperator.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:hardware:
          .:
          f:cpus:
          f:memory:
        f:policies:
          .:
          f:resources:
            .:
            f:requests:
              .:
              f:cpu:
              f:memory:
    Manager:         kubectl
    Operation:       Update
    Time:            2020-09-25T15:25:56Z
  Resource Version:  2939
  Self Link:         /apis/vmoperator.vmware.com/v1alpha1/virtualmachineclasses/guaranteed-small
  UID:               4a547922-0ecf-4899-a8b8-12cc6dbd78e8
Spec:
  Hardware:
    Cpus:    2
    Memory:  4Gi
  Policies:
    Resources:
      Requests:
        Cpu:     2000m
        Memory:  4Gi
Events:          <none>

Note: If you use a home lab with a small or even nested environment, it might be better to use a best-effort-small rather than a guaranteed-small for the guest cluster, as it won’t require as many resources. Simply edit the cluster.yaml appropriately and make the change for the worker and controlPlane class entries.

Having the “Requests” values for CPU and Memory match the “Spec.Hardware” entries means that these resources are guaranteed, rather than a best efforts. Of course, as we have seen earlier, best efforts is also available as a virtual machine class.

2.4 Apply the TKG manifest, and monitor the deployment

At this point, we can go ahead and deploy the TKG cluster by applying the manifest shown earlier. We can then use a variety of commands to monitor the deployment. The describe commands can be very long, so I will only show the output from when the cluster has been deployed, but you can obviously use the describe command repeatedly to monitor the TKG cluster deployment status.

C:\bin>kubectl apply -f cluster.yaml
tanzukubernetescluster.run.tanzu.vmware.com/tkg-cluster-01 created


C:\bin>kubectl get cluster
NAME             PHASE
tkg-cluster-01   Provisioned


C:\bin>kubectl get tanzukubernetescluster
NAME             CONTROL PLANE   WORKER   DISTRIBUTION                     AGE    PHASE
tkg-cluster-01   1               2        v1.17.7+vmware.1-tkg.1.154236c   3m7s   creating

Here is the complete output from a describe command. Lots of useful information, such as the use of Antrea as the CNI, node and VM status, Cluster API endpoint (from our load balancer, frontend network range of IP addresses)

C:\bin>kubectl describe tanzukubernetescluster
Name:         tkg-cluster-01
Namespace:    cormac-ns
Labels:       <none>
Annotations:  API Version:  run.tanzu.vmware.com/v1alpha1
Kind:         TanzuKubernetesCluster
Metadata:
  Creation Timestamp:  2020-09-23T15:41:25Z
  Finalizers:
    tanzukubernetescluster.run.tanzu.vmware.com
  Generation:  1
  Managed Fields:
    API Version:  run.tanzu.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:distribution:
          .:
          f:version:
        f:topology:
          .:
          f:controlPlane:
            .:
            f:class:
            f:count:
            f:storageClass:
          f:workers:
            .:
            f:class:
            f:count:
            f:storageClass:
    Manager:      kubectl
    Operation:    Update
    Time:         2020-09-23T15:41:25Z
    API Version:  run.tanzu.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"tanzukubernetescluster.run.tanzu.vmware.com":
      f:status:
        .:
        f:addons:
          .:
          f:authsvc:
            .:
            f:name:
            f:status:
            f:version:
          f:cloudprovider:
            .:
            f:name:
            f:status:
            f:version:
          f:cni:
            .:
            f:name:
            f:status:
            f:version:
          f:csi:
            .:
            f:name:
            f:status:
            f:version:
          f:dns:
            .:
            f:name:
            f:status:
            f:version:
          f:proxy:
            .:
            f:name:
            f:status:
            f:version:
          f:psp:
            .:
            f:name:
            f:status:
            f:version:
        f:clusterApiStatus:
          .:
          f:apiEndpoints:
          f:phase:
        f:nodeStatus:
          .:
          f:tkg-cluster-01-control-plane-sn85m:
          f:tkg-cluster-01-workers-csjd7-554668c497-vtf68:
          f:tkg-cluster-01-workers-csjd7-554668c497-z8vjt:
        f:phase:
        f:vmStatus:
          .:
          f:tkg-cluster-01-control-plane-sn85m:
          f:tkg-cluster-01-workers-csjd7-554668c497-vtf68:
          f:tkg-cluster-01-workers-csjd7-554668c497-z8vjt:
    Manager:         manager
    Operation:       Update
    Time:            2020-09-23T15:52:45Z
  Resource Version:  22469
  Self Link:         /apis/run.tanzu.vmware.com/v1alpha1/namespaces/cormac-ns/tanzukubernetesclusters/tkg-cluster-01
  UID:               9ede742d-c7e3-4715-ac7e-89d2ed312a16
Spec:
  Distribution:
    Full Version:  v1.17.7+vmware.1-tkg.1.154236c
    Version:       v1.17.7
  Settings:
    Network:
      Cni:
        Name:  antrea
      Pods:
        Cidr Blocks:
          192.168.0.0/16
      Service Domain:  cluster.local
      Services:
        Cidr Blocks:
          10.96.0.0/12
  Topology:
    Control Plane:
      Class:          guaranteed-small
      Count:          1
      Storage Class:  vsan-default-storage-policy
    Workers:
      Class:          guaranteed-small
      Count:          2
      Storage Class:  vsan-default-storage-policy
Status:
  Addons:
    Authsvc:
      Name:     authsvc
      Status:   applied
      Version:  0.1-65-ge3d8be8
    Cloudprovider:
      Name:     vmware-guest-cluster
      Status:   applied
      Version:  0.1-77-g5875817
    Cni:
      Name:     antrea
      Status:   applied
      Version:  v0.7.2_vmware.1
    Csi:
      Name:     pvcsi
      Status:   applied
      Version:  v0.0.1.alpha+vmware.73-4a26ce0
    Dns:
      Name:     CoreDNS
      Status:   applied
      Version:  v1.6.5_vmware.5
    Proxy:
      Name:     kube-proxy
      Status:   applied
      Version:  1.17.7+vmware.1
    Psp:
      Name:     defaultpsp
      Status:   applied
      Version:  v1.17.7+vmware.1-tkg.1.154236c
  Cluster API Status:
    API Endpoints:
      Host:  192.50.0.177
      Port:  6443

    Phase:   Provisioned
  Node Status:
    tkg-cluster-01-control-plane-sn85m:             ready
    tkg-cluster-01-workers-csjd7-554668c497-vtf68:  ready
    tkg-cluster-01-workers-csjd7-554668c497-z8vjt:  ready
  Phase:                                            running
  Vm Status:
    tkg-cluster-01-control-plane-sn85m:             ready
    tkg-cluster-01-workers-csjd7-554668c497-vtf68:  ready
    tkg-cluster-01-workers-csjd7-554668c497-z8vjt:  ready
Events:                                             <none>


C:\bin>kubectl get tanzukubernetescluster
NAME             CONTROL PLANE   WORKER   DISTRIBUTION                     AGE   PHASE
tkg-cluster-01   1               2        v1.17.7+vmware.1-tkg.1.154236c   11m   running


C:\bin>kubectl get cluster
NAME             PHASE
tkg-cluster-01   Provisioned

Of interest is the API Endpoints, which is another IP address from the range of Load Balancer IP addresses. From a networking perspective, our deployment now looks something similar to the following:

One last thing to notice is that if we switch back to the vSphere Client UI, and examine the namespace we can now see that there is an update in the Tanzu Kubernetes window showing one cluster deployed. And if you look to the left at the inventory, you can also see the TKG cluster as an inventory item in vSphere:

2.5 Logout, then login to TKG cluster context

All of the above was carried out in the context of a namespace in the vSphere with Tanzu Supervisor cluster. The next step is to logout from the Supervisor context, and login to the TKG guest cluster context. This allows us to direct kubectl commands at the TKG cluster API server, rather than the Kubernetes API server in the Supervisor cluster. There are other ways to achieve this through setting a KUBECONFIG environment variable, but I find it easier to simply logout and login again.

C:\bin>kubectl-vsphere.exe logout
Your KUBECONFIG context has changed.
The current KUBECONFIG context is unset.
To change context, use `kubectl config use-context <workload name>`
Logged out of all vSphere namespaces.

C:\bin>kubectl-vsphere.exe login \
--insecure-skip-tls-verify \
--vsphere-username administrator@vsphere.local \
--server=https://192.50.0.176 \
--tanzu-kubernetes-cluster-namespace cormac-ns \
--tanzu-kubernetes-cluster-name tkg-cluster-01
Password: ********
Logged in successfully.

You have access to the following contexts:
   192.50.0.176
   cormac-ns
   tkg-cluster-01

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

2.6 Validate TKG cluster context

We can run some kubectl commands to verify that we are now working in the correct context. If we display the nodes, we should see that the 1 x control plane node and the 2 x worker nodes specified in the manifest when we deployed the cluster. We should also be able to display all the pods deployed in the cluster, and observe both the Antrea agents for networking and CSI node agents for storage as well. It is also a good opportunity to check that everything has entered a Ready/Running state in the TKG cluster.

C:\bin>kubectl get nodes
NAME                                            STATUS   ROLES    AGE     VERSION
tkg-cluster-01-control-plane-sn85m              Ready    master   8m40s   v1.17.7+vmware.1
tkg-cluster-01-workers-csjd7-554668c497-vtf68   Ready    <none>   2m16s   v1.17.7+vmware.1
tkg-cluster-01-workers-csjd7-554668c497-z8vjt   Ready    <none>   2m16s   v1.17.7+vmware.1


C:\bin>kubectl get pods -A
NAMESPACE                      NAME                                                         READY   STATUS    RESTARTS   AGE
kube-system                    antrea-agent-7zphw                                           2/2     Running   0          2m23s
kube-system                    antrea-agent-mvczg                                           2/2     Running   0          2m23s
kube-system                    antrea-agent-t6qgc                                           2/2     Running   0          8m11s
kube-system                    antrea-controller-76c76c7b7c-4cwtm                           1/1     Running   0          8m11s
kube-system                    coredns-6c78df586f-6d77q                                     1/1     Running   0          7m55s
kube-system                    coredns-6c78df586f-c8rtj                                     1/1     Running   0          8m14s
kube-system                    etcd-tkg-cluster-01-control-plane-sn85m                      1/1     Running   0          7m24s
kube-system                    kube-apiserver-tkg-cluster-01-control-plane-sn85m            1/1     Running   0          7m12s
kube-system                    kube-controller-manager-tkg-cluster-01-control-plane-sn85m   1/1     Running   0          7m30s
kube-system                    kube-proxy-frpgn                                             1/1     Running   0          2m23s
kube-system                    kube-proxy-lchkq                                             1/1     Running   0          2m23s
kube-system                    kube-proxy-m2xjn                                             1/1     Running   0          8m14s
kube-system                    kube-scheduler-tkg-cluster-01-control-plane-sn85m            1/1     Running   0          7m27s
vmware-system-auth             guest-cluster-auth-svc-lf2nf                                 1/1     Running   0          8m2s
vmware-system-cloud-provider   guest-cluster-cloud-provider-7788f74548-cfng4                1/1     Running   0          8m13s
vmware-system-csi              vsphere-csi-controller-574cfd4569-kfdz8                      6/6     Running   0          8m12s
vmware-system-csi              vsphere-csi-node-hvzpg                                       3/3     Running   0          8m12s
vmware-system-csi              vsphere-csi-node-t4w8p                                       3/3     Running   0          2m23s
vmware-system-csi              vsphere-csi-node-t6rdw                                       3/3     Running   0          2m23s

Looks good. We have successfully deploy a TKG guest cluster in vSphere with Tanzu.

Summary

Over the last number of blog posts we have seen the following:

How vSphere with Tanzu is different to VCF with Tanzu, including network requirements. We also looked at the various prerequisites required to successfully deploy vSphere with Tanzu.
We looked at how to deploy the HA-Proxy, and how it provides load balancer services for vSphere with Tanzu.
We saw how to enable workload management, and stand up vSphere with Tanzu.
And finally, in this post, we saw how to deploy a TKG ‘guest’ cluster in vSphere with Tanzu.

Hopefully this has given you enough information to go and review vSphere with Tanzu in your own vSphere environment. I am always interested in hearing your feedback – what works, what doesn’t work, etc. Feel free to leave a comment or reach out to me on social media.

Happy Tanzu’ing.

Published by Cormac

View all posts by Cormac

28 Replies to “Deploying Tanzu Kubernetes “guest” cluster in vSphere with Tanzu”

Adrie T says:

October 16, 2020 at 8:10 am

Very good tutorial. Kudoz to you Cormac.
It makes me clear about the new feature of Kubernetes with Tanzu. Can you continue a bit about giving us an information about how to deploy a container directly in VMPods on the Supervisor cluster. A little bit of explanation about the difference between deploying workload on the conformance and non conformance Kubernetes cluster.
1. Cormac says:
  
  October 18, 2020 at 2:21 pm
  
  Hi Adrie – the process is exactly the same for Kubernetes Pods and PodVMs. You simply need to ensure that the context is set correctly. For PodVMs, the context is a namespace; for Kubernetes Pods, the context is the namespace plus the Guest cluster. There are a number of examples are ready on this blog site.
Anthony Spiteri says:

October 19, 2020 at 10:27 am

Hey Cormac… I’m getting what looks like a HA Admission Control issue trying to deploy TKG.

The host does not have sufficient CPU resources to satisfy the reservation

I’ve tried a few combinations… the hosts are nested but have 2vCPU and 24GB of memory each. (This is a HA-Proxy deployment as well)

Further from that, I can’t stop the deployment attempt… it’s stuck provisioning. How do I stop/kill the creations

Any tips on both?
1. Cormac says:
  
  October 19, 2020 at 3:10 pm
  
  Not sure what the admission control issue is Anthony – I’ve not tried a deployment with nested ESXi, only physical.
  
  If things are not responding, a last resort you could use is to reset the service through vCenter Server. Use the following command:
  
  root@vcsa-06 [ ~ ]# cd /usr/lib/vmware-wcp root@vcsa-06 [ /usr/lib/vmware-wcp ]# vmon-cli -r wcp
Ingvar Oskarsson says:

October 21, 2020 at 12:02 pm

HI Cormac

i was trying to connect to Supervisor control plane with the command “kubectl-vsphere.exe login –insecure-skip-tls-verify –vsphere-username \
administrator@vsphere.local –server=https://192.50.0.176″ and got the prompt for password, the i got a generic error message “time=”2020-10-21T10:48:03Z” level=error msg=”internal server error”

i dont see this in the wcpsvc log, what log should i be looking at to analyze this error?

BR
Ingvar
1. Cormac says:
  
  October 21, 2020 at 12:40 pm
  
  Not sure Ingvar – is this definitely the IP address of the control plane API server Load Balancer (provided by the HA-Proxy)?
  
  Has the Workload Management configured correct? Have you been able to create a Namespace? Can you click on the Kubernetes Tools IP Address and reach the Tools download page?
  1. Ingvar Oskarsson says:
    
    October 21, 2020 at 12:48 pm
    
    yes i did follow your guide and succesfully creted a namespace, i can access the webpage and see the tools download just as in you example, i surely could have done something wrong 🙂 but would like to get the correct log to look what have gone wrong.
    1. Tomvi says:
      
      November 19, 2020 at 3:58 pm
      
      Were you able to fix this Ingvar? I have the same problem/genreric error message. I can access the tools just fine.
Kritsadanshon Sadeewong says:

October 21, 2020 at 4:17 pm

HI Cormac,

Your article helpful for setup my lab. But i have some question.
From screenshot below link, After I tried deploy guestbook application on deployments menu at compute tab on namespce not show anything. Is this correct?

https://ibb.co/hMjtv7M
1. Cormac says:
  
  October 22, 2020 at 9:02 am
  
  Yes – that is expected. We only show PodVMs in the vSphere UI (those deployed in the Supervisor Cluster). Native Kubernetes Pods deployed in a TKG Guest Cluster do not appear in the UI. We only show guest cluster and node information.
  
  However, if you select the vSphere Cluster Object in the vSphere inventory, then select the Monitor tab and scroll down to Container Volumes, you should be able to see any PVs created on behalf of the guestbook application.
  1. makubeck says:
    
    November 9, 2020 at 9:13 am
    
    HI Cormac,
    Thank you very much.
Raul says:

October 21, 2020 at 7:23 pm

Hi Cormac, I would change the cluster.yaml example to use best-effort so small labs can power on the VMs, I found out using guaranteed was my problem in my small lab environment. Also my first try failed because Lifecycle manager, I guess that will be changed as many customers would be happy to use both Tanzu and Lifecycle Manager.

Thanks for the great post, it really helped me out to test Tanzu!!
1. Cormac says:
  
  October 22, 2020 at 9:03 am
  
  Great feedback – thanks Raul. I will add that note about the “best-effort” to the post.
Ingvar Oskarsson says:

October 22, 2020 at 8:50 pm

Hi Cormac, i deployed an TKG cluster with similar cluster.yaml as you have in your article, it just deployed an control plane VM and now i get an error as shown below and the deployment runs without creating the workers.

Authsvc:
Last Error Message: unable to create or update authsvc: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-auth?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Name:
Status: error
Cloudprovider:
Name: vmware-guest-cluster
Status: applied
Version: 0.1-77-g5875817
Cni:
Last Error Message: failed to update owner reference for antrea secret in supervisor cluster: Secret “tkg-cluster-02-antrea” not found
Name:
Status: error
Csi:
Last Error Message: Post https://192.168.15.178:6443/apis/rbac.authorization.k8s.io/v1/namespaces/vmware-system-csi/roles?timeout=10s: context deadline exceeded
Name:
Status: error
Dns:
Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
Name:
Status: error
Proxy:
Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
Name:
Status: error

BR
Ingvar
1. Cormac says:
  
  October 23, 2020 at 9:07 am
  
  Hi Ingvar,
  
  I *think* I may have had this issue when I deployed a 3 NIC HA-Proxy, but I had no route between the FrontEnd network and the Load Balancer network. If you do a ‘kubectl describe tanzukubernetescluster’, can you see all of the AddOns populated? Or are they errors against some of them as well?
  
  Can you successfully route between the FrontEnd and LB networks?
  1. Ingvar Oskarsson says:
    
    October 23, 2020 at 11:09 am
    
    i can ping all ip addresses, the LB network is on the same subnet as Frontend so i did not create any special routes there, maybe that is needed as LB does have different mask.
    
    the addons are giving me lot of errors, only antrea and defaultpsp seems to be applied
    
    the errors above are from the addons part of kubectl describe tanzukubernetescluster command.
    
    i dont have access to 3 different networks to test it properly, should i setup a new HA proxy with 2 nics to test if that would simplify deployment?
    i am just setting it up now to see basic functions and learning about Kubernetes on Vmware.
    1. Ingvar Oskarsson says:
      
      October 23, 2020 at 11:22 am
      
      Status:
      Addons:
      Authsvc:
      Last Error Message: unable to create or update authsvc: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-auth?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      Name: authsvc
      Status: error
      Version: 0.1-65-ge3d8be8
      Cloudprovider:
      Last Error Message: Put https://192.168.15.178:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/cloud-provider-patch-cluster-role?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      Name: vmware-guest-cluster
      Status: error
      Version: 0.1-77-g5875817
      Cni:
      Name: antrea
      Status: applied
      Version: v0.7.2_vmware.1
      Csi:
      Last Error Message: Get https://192.168.15.178:6443/api/v1/namespaces/vmware-system-csi?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
      Name: pvcsi
      Status: error
      Version: v0.0.1.alpha+vmware.73-4a26ce0
      Dns:
      Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
      Name:
      Status: error
      Proxy:
      Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
      Name:
      Status: error
      Psp:
      Name: defaultpsp
      Status: applied
      Version: v1.17.7+vmware.1-tkg.1.154236c
      Cluster API Status:
      API Endpoints:
      Host: 192.168.15.178
      Port: 6443
      Phase: Provisioned
      Node Status:
      tkg-cluster-02-control-plane-ncbj7: pending
      tkg-cluster-02-workers-ckjct-6f97fd7569-lbhwc: pending
      tkg-cluster-02-workers-ckjct-6f97fd7569-mv2vb: pending
      Phase: creating
      Vm Status:
      tkg-cluster-02-control-plane-ncbj7: ready
      tkg-cluster-02-workers-ckjct-6f97fd7569-lbhwc: pending
      tkg-cluster-02-workers-ckjct-6f97fd7569-mv2vb: pending
      Events:
      1. je_suis_twit (@SuisTwit) says:
        
        October 28, 2020 at 7:12 pm
        
        Same error. Tried reinstall maybe 10 times now, also tried frontend/workload on different routable network, same errors. I validated from within the control-plane VM (ssh using secret pass), and I can see ping and TCP working fine in any destination.
      2. Cormac says:
        
        October 29, 2020 at 7:42 am
        
        To simplify things, try a single frontend/workload network rather than separate ones. Just make sure that the CIDR matches an actual range, and that there is no overlap of IP address.
    2. Cormac says:
      
      October 23, 2020 at 3:52 pm
      
      So I think this may be subnet masks, or CIDR settings not mapping to ranges correctly Ingvar. I have seen this when there is no communication path between the LB assigned to the control plane and the actual IP addresses on the workload network. Use a CIDR calculator like https://www.ipaddressguide.com/cidr to make sure that you have correctly provided proper ranges that start and stop at boundaries that do not overlap anything else, e,g. gatewway, HA-Proxy interface, other range. You should be able to get it to work with just 2 ranges, just make sure the IP address ranges, CIDRs, subnet masks, are all good and you should be ok. Also make sure there is nothing like a DHCP server also provisioning IP addresses in those ranges. Another useful tool to have is an IP Address Scanner (I use the angry one – https://angryip.org/ – to make sure that the Proxy correctly plumbs up the Load Balancer range as expected (these become pingable once the proxy is configured). Good luck!
      1. Anthony Spiteri says:
        
        November 10, 2020 at 1:21 pm
        
        I’ve just come across this issue… almost 100% sure i’ve got the CIDRs right. (this is in my second deployment location… first one worked ok)
        
        It deploys the control node but I’m seeing this in the describe tkg cluster
        
        Dns:
        Last Error Message: unable to reconcile kubeadm ConfigMap’s CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: configmaps “kubeadm-config” not found
        Name:
        Status: error
        Proxy:
        Last Error Message: unable to retrieve kube-proxy daemonset from the guest cluster: daemonsets.apps “kube-proxy” not found
        Name:
        Status: error
        
        Also seems like it popped up here: https://www.reddit.com/r/vmware/comments/jp97lb/fail_to_install_a_guestcluster_on_70u1_vsphere/gbtonhz/?utm_source=reddit&utm_medium=web2x&context=3
      2. Cormac says:
        
        November 10, 2020 at 1:37 pm
        
        Things to check Anthony. Some you mentioned already:
        
        1. When doing a 3 x NIC deployment with separate frontend and workload networks, ensure there is a route between them. This caught me out, and I got the above error is what I observed. If it a 2 NIC deployment where the frontend and backend are on the same segment, then it is something in the configuration that is causing it.
        2. If you only have access to a partial subnet range, e.g. /16 or /8, for either the load balancer range and the workload range, make sure that you have the subnet mask set accordingly in all places. For example if the segment you are using for both the workload and the frontend is only made up of only the first 64 IP addresses of a range e.g. 10.0.0.0/26, then specify the /26 subnet when defining the HA Proxy Workload IP address as well as the workload network (step 7) in vSphere UI should be 255.255.255.192. Everything needs to match.
        3. When setting up the ranges, make sure that you check it against a CIDR calculator
        4. Make sure there is no DHCP server also offering up IP address on the subnets / ranges that you choose.
        
        HTH
Paul says:

October 27, 2020 at 8:07 pm

Hi Cormac, again thank you for all your posts regarding tanzu. I have a (may stupid) question: How do I shutdown safely a tanzu cluster? In an (home) lab sometimes I have to shutdown everything… Thanks a lot! Paul
Fahim says:

November 6, 2020 at 6:55 pm

Hi Cormac,
I tried a lot to deploy guest cluster but failed. Always stuck in creating phase. Control plane is deployed but pending in worker node . Can you please help me out. My mail id is fahimistiaq91@gmail.com or whatsapp +8801847133058
1. Cormac says:
  
  November 8, 2020 at 9:10 am
  
  I’m not in a position to do that. Please use GSS to open a support issue, or alternatively use the VMware Communities.
kastro says:

November 17, 2020 at 3:00 pm

Hi

I have the same problem as @Fahim, stuck in creating phase. Control VM is deployed, Workers do not.

Control VM has IP, but it’s not pingable from temp VM in same dvs and subnet.

35m Warning ReconcileFailure wcpmachine/simple-cluster-control-plane-qlvmw-wnfmd vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-namespace-01/simple-cluster/simple-cluster-control-plane-qlvmw-wnfmd
34m Warning ReconcileFailure wcpmachine/simple-cluster-control-plane-qlvmw-wnfmd vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-namespace-01/simple-cluster/simple-cluster-control-plane-qlvmw-wnfmd
36m Normal CreateVMServiceSuccess virtualmachineservice/simple-cluster-control-plane-service CreateVMService success
81s Normal Reconcile gateway/simple-cluster-control-plane-service Success
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-nxknp”
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-wlcdc”
36m Normal SuccessfulCreate machineset/simple-cluster-workers-9h754-75f8d97dd8 Created machine “simple-cluster-workers-9h754-75f8d97dd8-qq7hw”
36m Normal SuccessfulCreate machinedeployment/simple-cluster-workers-9h754 Created MachineSet “simple-cluster-workers-9h754-75f8d97dd8”
36m Warning ReconcileFailure wcpcluster/simple-cluster unexpected error while reconciling control plane endpoint for simple-cluster: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-namespace-01/simple-cluster: failed to get control plane endpoint for Cluster tanzu-namespace-01/simple-cluster: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses

Any ideas?
1. Cormac says:
  
  November 18, 2020 at 9:15 am
  
  You will need to go back and recheck the networking. Sounds like something is amiss with the CIDR range/subnet masks that were used for the frontend network or the workload network. Ensure that from a temp VM on the workload network that you are able to each the haproxy URL on port 5556, and that you are able to ping the IP addresses on the frontend/load balancer network before deploying workload management.
hemant kumar gupta says:

November 23, 2020 at 1:01 am

Hi Cormac,
Thanks for this wonderfull blog post !!

I have deployed vSphere with Tanzu , utilising vSphere networking . Everything is up and running fine as expected. Tested with deployment of nginx application in Guest cluster . Need your inputs to understand from where do kubernetes fetched registry-images for deployment of application . And how can i connect this vsphere with tanzu cluster , to gitlab so, that i can import images to deploy applications . Unfortunately don’t have NSX-T in place , because of which can’t use embedded harbor registry repository feature .

Please help me with the query ……