TKG v1.3 Active Directory Integration with Pinniped and Dex
Tanzu Kubernetes v1.3 introduces OIDC and LDAP identity management with Pinniped and Dex. Pinniped allows you to plug external OpenID Connect (OIDC) or LDAP identity providers (IDP) into Tanzu Kubernetes clusters which in turn allows you to control access to those clusters. Pinniped uses Dex as the endpoint to connect to your upstream LDAP identity provider, e.g. Microsoft Active Directory. If you are using OpenID Connect (OIDC), Dex is not required. It is also my understanding that eventually Pinniped with eventually integrate directly with LDAP as well, removing the need for Dex. But for the moment, both components are required. Since I am already using Microsoft Active Directory in my lab, I decided to give the integration a go, and control user access to my Tanzu Kubernetes Cluster(s) via Active Directory.
Note once again that this is the standalone or multi-cloud flavour of TKG, as opposed to the Tanzu Kubernetes Clusters provisioned in vSphere with Tanzu. More details about Identity Management in TKG can be found in the official docs here.
Requirements
Here a few considerations before we begin.
- If deploying TKG from a Linux desktop, you will need a graphical user interface capable of opening a browser. This is because a browser tab is opened so that AD/LDAP credentials can be provided in the Dex endpoint when an AD user first tries to interact with a workload cluster.
- You will need to be able to retrieve the Base 64 root certificate of authority (CA) from your identity provider. I will show how this can be done for a Microsoft Active Directory Certificate Service, but this step will vary for other providers.
- You will need to have a good understanding of LDAP directory attributes, such as OU, CN, DC, etc. The official TKG documentation does not go into details regarding LDAP configuration options, so I strongly recommend referencing these two excellent resources from my colleagues. Chris Little provides some very useful instructions in his blog on the NSX ALB, while Brian Ragazzi’s blog on LDAP settings was also invaluable. Another useful resource is Tom Schwaller’s TKG 1.3 blog.
- You will need to determine if your LDAP service is also a global catalog server. Secure LDAP communicates over TCP port 636. If there is also a global catalog server, then communication occurs over TCP port 3269.
Retrieving Root CA from Active Directory Certificate Services
As mentioned, I am using Microsoft Active Directory Certificate Service. To retrieve the CA cert, I simply point a browser to the certificate service and login:
Next, click on the option to “Download a CA certificate”. This will open the following window. Select the Base 64 Encoding method, and then click on the “Download CA certificate” link. With the certificate safely saved, we can proceed to the TKG management cluster deployment.
TKG Management Cluster deployment
There are numerous examples of how to deploy the management cluster, both in this blog and elsewhere. I’m not going to describe the process in detail. Instead I will focus on the Identity Management section. This is the completed configuration from my management cluster deployment, when using the -u (–ui) option to create the configuration file. Note the presence of the ROOT CA, which I have blanked out in the screenshot. This is the ROOT CA downloaded from the AD Certificate Services in the previous step. As mentioned, the BIND, FILTER and other ATTRIBUTES might need to be modified for your specific needs.
The full management cluster configuration manifest looks similar to the following, once the UI configuration has been completed. The configurations are saved in ~/.tanzu/tkg/clusterconfigs. You can see again the populated LDAP fields, including the Base 64 ROOT CA.
AVI_CA_DATA_B64: "" AVI_CLOUD_NAME: "" AVI_CONTROLLER: "" AVI_DATA_NETWORK: "" AVI_DATA_NETWORK_CIDR: "" AVI_ENABLE: "false" AVI_LABELS: "" AVI_PASSWORD: "" AVI_SERVICE_ENGINE_GROUP: "" AVI_USERNAME: "" CLUSTER_CIDR: 100.96.13.0/11 CLUSTER_NAME: tkg-ldaps-mgmt CLUSTER_PLAN: dev ENABLE_CEIP_PARTICIPATION: "false" ENABLE_MHC: "true" IDENTITY_MANAGEMENT_TYPE: ldap INFRASTRUCTURE_PROVIDER: vsphere LDAP_BIND_DN: cn=Administrator,cn=Users,dc=rainpole,dc=com LDAP_BIND_PASSWORD: <encoded:VnhSYWlsITIz> LDAP_GROUP_SEARCH_BASE_DN: dc=rainpole,dc=com LDAP_GROUP_SEARCH_FILTER: (objectClass=group) LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN LDAP_HOST: dc01.rainpole.com:636 LDAP_ROOT_CA_DATA_B64: LS0tLS1CRUdJ... LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=rainpole,dc=com LDAP_USER_SEARCH_FILTER: (objectClass=person) LDAP_USER_SEARCH_NAME_ATTRIBUTE: userPrincipalName LDAP_USER_SEARCH_USERNAME: userPrincipalName OIDC_IDENTITY_PROVIDER_CLIENT_ID: "" OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: "" OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: "" OIDC_IDENTITY_PROVIDER_ISSUER_URL: "" OIDC_IDENTITY_PROVIDER_NAME: "" OIDC_IDENTITY_PROVIDER_SCOPES: "" OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: "" SERVICE_CIDR: 100.64.13.0/13 TKG_HTTP_PROXY_ENABLED: "false" VSPHERE_CONTROL_PLANE_DISK_GIB: "20" VSPHERE_CONTROL_PLANE_ENDPOINT: 10.27.51.237 VSPHERE_CONTROL_PLANE_MEM_MIB: "4096" VSPHERE_CONTROL_PLANE_NUM_CPUS: "2" VSPHERE_DATACENTER: /OCTO-Datacenter VSPHERE_DATASTORE: /OCTO-Datacenter/datastore/vsan-OCTO-Cluster-B VSPHERE_FOLDER: /OCTO-Datacenter/vm/TKG VSPHERE_NETWORK: VM-51-DVS-B VSPHERE_PASSWORD: <encoded:Vk13YXJlMTIzIQ==> VSPHERE_RESOURCE_POOL: /OCTO-Datacenter/host/OCTO-Cluster-B/Resources VSPHERE_SERVER: vcsa-06.rainpole.com VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAA... chogan@chogan-a01.vmware.com VSPHERE_TLS_THUMBPRINT: FA:A5:8A:... VSPHERE_USERNAME: administrator@vsphere.local VSPHERE_WORKER_DISK_GIB: "20" VSPHERE_WORKER_MEM_MIB: "4096" VSPHERE_WORKER_NUM_CPUS: "2"
The build of the management cluster looks something like this:
$ tanzu management-cluster create --file ./mgmt_cluster.yaml Validating the pre-requisites... vSphere 7.0 Environment Detected. You have connected to a vSphere 7.0 environment which does not have vSphere with Tanzu enabled. vSphere with Tanzu includes an integrated Tanzu Kubernetes Grid Service which turns a vSphere cluster into a platform for running Kubernetes workloads in dedicated resource pools. Configuring Tanzu Kubernetes Grid Service is done through vSphere HTML5 client. Tanzu Kubernetes Grid Service is the preferred way to consume Tanzu Kubernetes Grid in vSphere 7.0 environments. Alternatively you may deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0. Note: To skip the prompts and directly deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0, you can set the 'DEPLOY_TKG_ON_VSPHERE7' configuration variable to 'true' Do you want to configure vSphere with Tanzu? [y/N]: N Would you like to deploy a non-integrated Tanzu Kubernetes Grid management cluster on vSphere 7.0? [y/N]: y Deploying TKG management cluster on vSphere 7.0 ... Setting up management cluster... Validating configuration... Using infrastructure provider vsphere:v0.7.7 Generating cluster configuration... Setting up bootstrapper... Bootstrapper created. Kubeconfig: /home/cormac/.kube-tkg/tmp/config_SR91Ri9a Installing providers on bootstrapper... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system" Start creating management cluster... Saving management cluster kubeconfig into /home/cormac/.kube/config Installing providers on management cluster... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system" Waiting for the management cluster to get ready for move... Waiting for addons installation... Moving all Cluster API objects from bootstrap cluster to management cluster... Performing move... Discovering Cluster API objects Moving Cluster API objects Clusters=1 Creating objects in the target cluster Deleting objects from the source cluster Waiting for additional components to be up and running... Context set for management cluster tkg-ldaps-mgmt as 'tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt'. Management cluster created! You can now create your first workload cluster by running the following: tanzu cluster create [name] -f [file] Some addons might be getting installed! Check their status by running the following: kubectl get apps -A
At this point, you should definitely validate that the Pinniped add-on has reconciled successfully. It is worth waiting a minute or so to ensure that is the case, as the Pinniped post deploy job only succeeds once the Pinniped concierge deployment is ready. First, login to the correct TKG management cluster.
$ tanzu cluster list --include-management-cluster NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN tkg-ldaps-mgmt tkg-system running 1/1 1/1 v1.20.5+vmware.1 management dev $ tanzu login ? Select a server tkg-ldaps-mgmt () ✔ successfully logged in to management cluster using the kubeconfig tkg-ldaps-mgmt
If you have other Kubernetes contexts, you may need to switch to the newly created management cluster context before you can query the add-on apps. Then ensure all the reconciles have succeeded.
$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin $ kubectl config use-context tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt Switched to context "tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt". $ kubectl get nodes NAME STATUS ROLES AGE VERSION tkg-ldaps-mgmt-control-plane-hr6nb Ready control-plane,master 11m v1.20.5+vmware.1 tkg-ldaps-mgmt-md-0-54c99747c7-xhs6q Ready <none> 10m v1.20.5+vmware.1 $ kubectl get apps -A NAMESPACE NAME DESCRIPTION SINCE-DEPLOY AGE tkg-system antrea Reconcile succeeded 53s 7m6s tkg-system metrics-server Reconcile succeeded 24s 7m6s tkg-system pinniped Reconcile succeeded 28s 7m7s tkg-system tanzu-addons-manager Reconcile succeeded 119s 11m tkg-system vsphere-cpi Reconcile succeeded 109s 7m7s tkg-system vsphere-csi Reconcile succeeded 5m33s 7m6s
Note that it is common to see some Pod failures on the management cluster for the pinniped-post-deploy-job. Once the Pinniped concierge deployment is online, an instance of this post deploy job should complete. If there are any Pinniped or Dex deployment failures, check the Pod logs as this might highlight an LDAP configuration issue.
TKG Workload Cluster deployment
We are now ready to create our first workload cluster. Here is the configuration file that I am using for this deployment.
#! -- See https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-tanzu-k8s-clusters-vsphere.html ## ##! --------------------------------------------------------------------- ##! Basic cluster creation configuration ##! --------------------------------------------------------------------- ## CLUSTER_NAME: tkg-ldaps-wkld CLUSTER_PLAN: prod CNI: antrea # ##! --------------------------------------------------------------------- ##! Node configuration ##! --------------------------------------------------------------------- # CONTROL_PLANE_MACHINE_COUNT: 1 WORKER_MACHINE_COUNT: 2 VSPHERE_CONTROL_PLANE_NUM_CPUS: 2 VSPHERE_CONTROL_PLANE_DISK_GIB: 40 VSPHERE_CONTROL_PLANE_MEM_MIB: 8192 VSPHERE_WORKER_NUM_CPUS: 2 VSPHERE_WORKER_DISK_GIB: 40 VSPHERE_WORKER_MEM_MIB: 4096 # ##! --------------------------------------------------------------------- ##! vSphere configuration ##! --------------------------------------------------------------------- # VSPHERE_DATACENTER: /OCTO-Datacenter VSPHERE_DATASTORE: /OCTO-Datacenter/datastore/vsan-OCTO-Cluster-B VSPHERE_FOLDER: /OCTO-Datacenter/vm/TKG VSPHERE_NETWORK: VM-51-DVS-B VSPHERE_PASSWORD: <encoded:Vk13YXJlMTIzIQ==> VSPHERE_RESOURCE_POOL: /OCTO-Datacenter/host/OCTO-Cluster-B/Resources VSPHERE_SERVER: vcsa-06.rainpole.com VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAA... chogan@chogan-a01.vmware.com VSPHERE_TLS_THUMBPRINT: FA:A5:8A:... VSPHERE_USERNAME: administrator@vsphere.local VSPHERE_CONTROL_PLANE_ENDPOINT: 10.27.51.238 # #! --------------------------------------------------------------------- #! Common configuration #! --------------------------------------------------------------------- ENABLE_DEFAULT_STORAGE_CLASS: true CLUSTER_CIDR: 100.96.13.0/11 SERVICE_CIDR: 100.64.13.0/13
Again, there are lots of information about the contents of this file out there in the wild, so I am not going to spend any time explaining it. Hopefully by reading the file contents you will get a good idea now about the TKG workload cluster that this configuration will create. Let’s deploy the workload cluster, then retrieve the KUBECONFIG file so that we can interact with it.
$ tanzu cluster create --file ./workload_cluster.yaml Validating configuration... Creating workload cluster 'tkg-ldaps-wkld'... Waiting for cluster to be initialized... Waiting for cluster nodes to be available... Waiting for addons installation... Workload cluster 'tkg-ldaps-wkld' created $ tanzu cluster list --include-management-cluster NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN tkg-ldaps-wkld default running 1/1 2/2 v1.20.5+vmware.1 <none> prod tkg-ldaps-mgmt tkg-system running 1/1 1/1 v1.20.5+vmware.1 management dev $ tanzu cluster kubeconfig get tkg-ldaps-wkld ℹ You can now access the cluster by running 'kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld' $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld tkg-ldaps-wkld tanzu-cli-tkg-ldaps-wkld * tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin $ kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld Switched to context "tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld". $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin * tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld tkg-ldaps-wkld tanzu-cli-tkg-ldaps-wkld tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin
Logging into AD Endpoint via Dex
If you are using a headless, non-graphical desktop, or you are SSH’ed into the desktop where you are running the kubectl commands, an attempt to query the nodes (or indeed any interaction with the cluster) at this point will produce a message similar to the following:
$ kubectl get nodes Error: no DISPLAY environment variable specified ^C
What is happening here is that kubectl is calling a tanzu login to perform a federated login to the Identity Provider via Pinniped and Dex. This output confused me at first, until I realized that it was trying to launch a browser tab to prompt for AD/LDAP credentials. These are the credentials of the developer that wishes to be granted access to the workload cluster(s). This is why I mentioned in the requirements that you need to do this deployment on a desktop that has a GUI (at least, I am not aware of any way to provide these credentials at the command line). So when a kubectl is initiated, creating a tanzu login, Dex hosts a browser page which prompts for AD/LDAP credentials. At this point, the LDAP username of the person who will be interacting with the cluster, e.g. a developer, are added. Let’s suppose that this person is a developer with username chogan@rainpole.com. We would then add that username and password, and click on the login button.
Assuming the credentials are successful, and everything is working correctly, then we should see the following appear in the browser tab:
You can also provide developer access directly by running the following command, instead of launching it via a kubectl:
$ tanzu login --endpoint https://<MGMT Cluster API Server IP address>:6443 --name <MGMT Cluster Name>
You can get the API server IP address and the name of the management cluster from ~/.kube/config. This launches the browser tab as before, and the developer credentials are provided once again. This step will create a file called ~/.tanzu/pinniped/sessions.yaml for this developer/user. This has all of the information retrieved from the Identity Management system, in this case Active Directory. So far, so good. However, we are not finished yet, because if we try to query the workload cluster as the developer, they now face the following error:
$ kubectl get nodes Error from server (Forbidden): nodes is forbidden: User "chogan@rainpole.com" cannot list resource "nodes" in API group "" at the cluster scope
OK – so now we have a bit of a chicken and egg situation. We need to create a ClusterRoleBinding for chogan@rainpole.com to give the developer access to the cluster, but the developer does not have permissions to interact with the cluster to create the role binding. This is a job for the cluster admin. The cluster admin gains admin permissions on the cluster by using the tanzu command with the –admin option to retrieve a new context with admin privileges.
$ tanzu cluster kubeconfig get tkg-ldaps-wkld --admin Credentials of cluster 'tkg-ldaps-wkld' have been saved You can now access the cluster by running 'kubectl config use-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld' $ kubectl config use-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld Switched to context "tkg-ldaps-wkld-admin@tkg-ldaps-wkld". $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld tkg-ldaps-wkld tanzu-cli-tkg-ldaps-wkld tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin * tkg-ldaps-wkld-admin@tkg-ldaps-wkld tkg-ldaps-wkld tkg-ldaps-wkld-admin
We have now created a new context entry that has admin permissions on the workload cluster. The next step is to create a ClusterRoleBinding manifest for the user chogan@rainpole.com, and apply it to the cluster. In the example here, it is being created at a (kind:) user level. This could also be done at the (kind:) group level if there are multiple developers or users that need access, and they are all part of the same AD group. We have also provide a ClusterRole of cluster-admin in this case, but there are many other roles that can be assigned.
$ cat chogan-crb.yaml kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: chogan subjects: - kind: User name: chogan@rainpole.com apiGroup: roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io $ kubectl apply -f chogan-crb.yaml clusterrolebinding.rbac.authorization.k8s.io/chogan created
So let’s change context back to the non-admin context, delete the admin context and see if the user chogan@rainpole.com can now query the cluster with the ClusterRoleBinding in place.
$ kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld Switched to context "tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld". $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin * tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld tkg-ldaps-wkld tanzu-cli-tkg-ldaps-wkld tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin tkg-ldaps-wkld-admin@tkg-ldaps-wkld tkg-ldaps-wkld tkg-ldaps-wkld-admin $ kubectl config delete-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld deleted context tkg-ldaps-wkld-admin@tkg-ldaps-wkld from /home/cormac/.kube/config $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin * tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld tkg-ldaps-wkld tanzu-cli-tkg-ldaps-wkld tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt tkg-ldaps-mgmt tkg-ldaps-mgmt-admin $ kubectl get nodes NAME STATUS ROLES AGE VERSION tkg-ldaps-wkld-control-plane-skhm2 Ready control-plane,master 38m v1.20.5+vmware.1 tkg-ldaps-wkld-md-0-5d44ddfb98-7tlsg Ready <none> 36m v1.20.5+vmware.1 tkg-ldaps-wkld-md-0-5d44ddfb98-tjk2v Ready <none> 36m v1.20.5+vmware.1
Success! User/developer chogan@rainpole.com is now able to successfully interact with the TKG workload cluster after being authenticated via Pinniped and Dex to Active Directory/LDAP. Note that the tanzu login command (via Dex) only needs to be done once for a developer for all workload clusters. However, the ClusterRoleBinding would need to be done for each workload cluster, allowing the cluster admin to give the same developer different permissions on each cluster. If you examine the ~./kube/config you will notice that for the workload cluster(s), a tanzu login with Pinniped authentication is included in the KUBECONFIG context logic when the workload cluster is accessed.
Troubleshooting Tips
A few things to keep in mind when configuring LDAP with TKG.
IP address or FQDN for LDAP HOST
If you haven’t added an Subject Alternate Name (SAN) for the IP Address of the LDAP Host to your certificate, make sure you use the FQDN. If you don’t, when you try to connect via Dex, it will fail as follows:
Mangled LDAP Attributes
If you do not set the LDAP attributes such as OU, DC, CN correctly in the management cluster configuration, you may end up with a connection failure similar to this:
This is where the blog posts from Chris Little (NSX ALB) and Brian Ragazzi’s (LDAP settings) were a huge help.
Fat Fingered Credentials
This one was a little more obvious. If you don’t provide the correct LDAP_BIND_PASSWORD, you will see something like this when you try to authenticate.
I think a useful feature would be the ability to do a dry-run test of LDAP, or something similar, when we populate these fields in the UI but before we commit the configuration to the management cluster. I am feeding this back to the various teams responsible for this feature.
Dex Pod Failure
Of course, your deployment may not even get this far. It might be that you hit an issue with the Pinniped or Dex Pods failing. In this example, I didn’t populate all of the LDAP fields in the UI. I omitted LDAP_USER_SEARCH_USERNAME. Note that there is no validation check done to ensure all the fields are present and correct. Because of this, the Pinniped Post Deploy Job Pod did not complete. I checked the Pod logs which pointed me to a Dex issue. When I checked the Dex Pod’s logs and it told me that is was missing this required field.
$ kubectl logs pinniped-post-deploy-job-ckcgk -n pinniped-supervisor 2021-06-16T13:15:26.066Z INFO inspect/inspect.go:88 Getting TKG metadata... 2021-06-16T13:15:26.074Z INFO configure/configure.go:65 Readiness check for required resources 2021-06-16T13:15:26.086Z INFO configure/configure.go:102 The Pinniped concierge deployments are ready 2021-06-16T13:15:26.088Z INFO configure/configure.go:136 The Pinniped supervisor deployments are ready 2021-06-16T13:15:26.093Z INFO configure/configure.go:153 The Pinniped OIDCIdentityProvider pinniped-supervisor/upstream-oidc-identity-provider is ready 2021-06-16T13:15:58.428Z ERROR configure/configure.go:177 the Dex deployment is not ready, error: Dex deployment does not have enough ready replicas. 0/1 are ready github.com/vmware-tanzu-private/core/addons/pinniped/post-deploy/pkg/configure.ensureResources /workspace/pkg/configure/configure.go:177 github.com/vmware-tanzu-private/core/addons/pinniped/post-deploy/pkg/configure.TKGAuthentication /workspace/pkg/configure/configure.go:197 main.main /workspace/main.go:103 runtime.main /usr/local/go/src/runtime/proc.go:203 2021-06-16T13:15:58.428Z ERROR workspace/main.go:111 Dex deployment does not have enough ready replicas. 0/1 are ready main.main /workspace/main.go:111 runtime.main /usr/local/go/src/runtime/proc.go:203
$ kubectl logs dex-64884d69fc-mhxmj -n tanzu-system-auth {"level":"info","msg":"config using log level: info","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config issuer: https://0.0.0.0:30167","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"kubernetes client apiVersion = dex.coreos.com/v1","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"creating custom Kubernetes resources","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource authcodes.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource authcodes.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource authrequests.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource authrequests.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource oauth2clients.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource oauth2clients.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource signingkeies.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource signingkeies.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource refreshtokens.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource refreshtokens.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource passwords.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource passwords.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource offlinesessionses.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource offlinesessionses.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource connectors.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The customresource connectors.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource devicerequests.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource devicerequests.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"checking if custom resource devicetokens.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"The custom resource devicetokens.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config storage: kubernetes","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config static client: pinniped","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config connector: ldap","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config response types accepted: [code]","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config skipping approval screen","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config signing keys expire after: 1h30m0s","time":"2021-06-16T13:15:46Z"} {"level":"info","msg":"config id tokens valid for: 5m0s","time":"2021-06-16T13:15:46Z"} failed to initialize server: server: Failed to open connector ldap: failed to open connector: \ failed to create connector ldap: ldap: missing required field "userSearch.username"
Those are just some tips and gotchas to be aware of. That completes the post. Hope you find it useful. If you have any further observations or suggestions, feel free to leave a comment.
Great Content! I am able to deploy the cluster but without ldaps. If I use my internal CA root certificate I receive the following Error:
“Network Error” :x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “LDAPCA””)
How Can I resolve this issue?
Thank you,
Alessandro
Is there any possibility that there might be an old KUBECONFIG hanging around from your previous tests, and that you are simply using an out-of-date context when trying to reach the cluster? I don’t know what else might be causing it tbh.
Hi Cormac I don’t think so. If I use the Domain controller certificate, during cluster deployment instead of rootCA it works. I tried to set up cluster with another active directory and I receive the same error. I am using TKG 1.3.1.
Any suggestion?
Thank you,
Alessandro
Hi Cormac, Thankyou for the article. If incase we have faced any of the mentioned issues. How can we troubleshoot them without making other changes to cluster. Can we update correct details and recreate the DEX pod.
I’d probably look to the docs and GitHub issues for guidance Gowtham – https://github.com/dexidp/dex#readme, https://dexidp.io/docs/kubernetes & https://github.com/dexidp/dex/issues