TKG v1.3 Active Directory Integration with Pinniped and Dex

Cormac

3 years ago

Tanzu Kubernetes v1.3 introduces OIDC and LDAP identity management with Pinniped and Dex. Pinniped allows you to plug external OpenID Connect (OIDC) or LDAP identity providers (IDP) into Tanzu Kubernetes clusters which in turn allows you to control access to those clusters. Pinniped uses Dex as the endpoint to connect to your upstream LDAP identity provider, e.g. Microsoft Active Directory. If you are using OpenID Connect (OIDC), Dex is not required. It is also my understanding that eventually Pinniped with eventually integrate directly with LDAP as well, removing the need for Dex. But for the moment, both components are required. Since I am already using Microsoft Active Directory in my lab, I decided to give the integration a go, and control user access to my Tanzu Kubernetes Cluster(s) via Active Directory.

Note once again that this is the standalone or multi-cloud flavour of TKG, as opposed to the Tanzu Kubernetes Clusters provisioned in vSphere with Tanzu. More details about Identity Management in TKG can be found in the official docs here.

Requirements

Here a few considerations before we begin.

If deploying TKG from a Linux desktop, you will need a graphical user interface capable of opening a browser. This is because a browser tab is opened so that AD/LDAP credentials can be provided in the Dex endpoint when an AD user first tries to interact with a workload cluster.
You will need to be able to retrieve the Base 64 root certificate of authority (CA) from your identity provider. I will show how this can be done for a Microsoft Active Directory Certificate Service, but this step will vary for other providers.
You will need to have a good understanding of LDAP directory attributes, such as OU, CN, DC, etc. The official TKG documentation does not go into details regarding LDAP configuration options, so I strongly recommend referencing these two excellent resources from my colleagues. Chris Little provides some very useful instructions in his blog on the NSX ALB, while Brian Ragazzi’s blog on LDAP settings was also invaluable. Another useful resource is Tom Schwaller’s TKG 1.3 blog.
You will need to determine if your LDAP service is also a global catalog server. Secure LDAP communicates over TCP port 636. If there is also a global catalog server, then communication occurs over TCP port 3269.

Retrieving Root CA from Active Directory Certificate Services

As mentioned, I am using Microsoft Active Directory Certificate Service. To retrieve the CA cert, I simply point a browser to the certificate service and login:

Next, click on the option to “Download a CA certificate”. This will open the following window. Select the Base 64 Encoding method, and then click on the “Download CA certificate” link. With the certificate safely saved, we can proceed to the TKG management cluster deployment.

TKG Management Cluster deployment

There are numerous examples of how to deploy the management cluster, both in this blog and elsewhere. I’m not going to describe the process in detail. Instead I will focus on the Identity Management section. This is the completed configuration from my management cluster deployment, when using the -u (–ui) option to create the configuration file. Note the presence of the ROOT CA, which I have blanked out in the screenshot. This is the ROOT CA downloaded from the AD Certificate Services in the previous step. As mentioned, the BIND, FILTER and other ATTRIBUTES might need to be modified for your specific needs.

The full management cluster configuration manifest looks similar to the following, once the UI configuration has been completed. The configurations are saved in ~/.tanzu/tkg/clusterconfigs. You can see again the populated LDAP fields, including the Base 64 ROOT CA.

AVI_CA_DATA_B64: ""
AVI_CLOUD_NAME: ""
AVI_CONTROLLER: ""
AVI_DATA_NETWORK: ""
AVI_DATA_NETWORK_CIDR: ""
AVI_ENABLE: "false"
AVI_LABELS: ""
AVI_PASSWORD: ""
AVI_SERVICE_ENGINE_GROUP: ""
AVI_USERNAME: ""
CLUSTER_CIDR: 100.96.13.0/11
CLUSTER_NAME: tkg-ldaps-mgmt
CLUSTER_PLAN: dev
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: ldap
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: cn=Administrator,cn=Users,dc=rainpole,dc=com
LDAP_BIND_PASSWORD: <encoded:VnhSYWlsITIz>
LDAP_GROUP_SEARCH_BASE_DN: dc=rainpole,dc=com
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: dc01.rainpole.com:636
LDAP_ROOT_CA_DATA_B64: LS0tLS1CRUdJ...
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=rainpole,dc=com
LDAP_USER_SEARCH_FILTER: (objectClass=person)
LDAP_USER_SEARCH_NAME_ATTRIBUTE: userPrincipalName
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
SERVICE_CIDR: 100.64.13.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: 10.27.51.237
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /OCTO-Datacenter
VSPHERE_DATASTORE: /OCTO-Datacenter/datastore/vsan-OCTO-Cluster-B
VSPHERE_FOLDER: /OCTO-Datacenter/vm/TKG
VSPHERE_NETWORK: VM-51-DVS-B
VSPHERE_PASSWORD: <encoded:Vk13YXJlMTIzIQ==>
VSPHERE_RESOURCE_POOL: /OCTO-Datacenter/host/OCTO-Cluster-B/Resources
VSPHERE_SERVER: vcsa-06.rainpole.com
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAA... chogan@chogan-a01.vmware.com
VSPHERE_TLS_THUMBPRINT: FA:A5:8A:...
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "4096"
VSPHERE_WORKER_NUM_CPUS: "2"

The build of the management cluster looks something like this:

$ tanzu management-cluster create --file ./mgmt_cluster.yaml

Validating the pre-requisites...

vSphere 7.0 Environment Detected.

You have connected to a vSphere 7.0 environment which does not have vSphere with Tanzu enabled. vSphere with Tanzu includes
an integrated Tanzu Kubernetes Grid Service which turns a vSphere cluster into a platform for running Kubernetes workloads in dedicated
resource pools. Configuring Tanzu Kubernetes Grid Service is done through vSphere HTML5 client.

Tanzu Kubernetes Grid Service is the preferred way to consume Tanzu Kubernetes Grid in vSphere 7.0 environments. Alternatively you may
deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0.
Note: To skip the prompts and directly deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0, you can set the 'DEPLOY_TKG_ON_VSPHERE7' configuration variable to 'true'

Do you want to configure vSphere with Tanzu? [y/N]: N
Would you like to deploy a non-integrated Tanzu Kubernetes Grid management cluster on vSphere 7.0? [y/N]: y
Deploying TKG management cluster on vSphere 7.0 ...

Setting up management cluster...
Validating configuration...
Using infrastructure provider vsphere:v0.7.7
Generating cluster configuration...
Setting up bootstrapper...
Bootstrapper created. Kubeconfig: /home/cormac/.kube-tkg/tmp/config_SR91Ri9a
Installing providers on bootstrapper...
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system"
Start creating management cluster...
Saving management cluster kubeconfig into /home/cormac/.kube/config
Installing providers on management cluster...
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system"
Waiting for the management cluster to get ready for move...
Waiting for addons installation...
Moving all Cluster API objects from bootstrap cluster to management cluster...
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Waiting for additional components to be up and running...
Context set for management cluster tkg-ldaps-mgmt as 'tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt'.

Management cluster created!


You can now create your first workload cluster by running the following:

 tanzu cluster create [name] -f [file]


Some addons might be getting installed! Check their status by running the following:

 kubectl get apps -A

At this point, you should definitely validate that the Pinniped add-on has reconciled successfully. It is worth waiting a minute or so to ensure that is the case, as the Pinniped post deploy job only succeeds once the Pinniped concierge deployment is ready. First, login to the correct TKG management cluster.

$ tanzu cluster list --include-management-cluster
 NAME           NAMESPACE   STATUS  CONTROLPLANE  WORKERS  KUBERNETES       ROLES       PLAN
 tkg-ldaps-mgmt tkg-system  running 1/1           1/1      v1.20.5+vmware.1 management  dev


$ tanzu login
? Select a server tkg-ldaps-mgmt ()
✔ successfully logged in to management cluster using the kubeconfig tkg-ldaps-mgmt

If you have other Kubernetes contexts, you may need to switch to the newly created management cluster context before you can query the add-on apps. Then ensure all the reconciles have succeeded.

$ kubectl config get-contexts
CURRENT NAME                                 CLUSTER         AUTHINFO             NAMESPACE
        kubernetes-admin@kubernetes          kubernetes      kubernetes-admin
        tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt  tkg-ldaps-mgmt  tkg-ldaps-mgmt-admin


$ kubectl config use-context tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt
Switched to context "tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt".


$ kubectl get nodes
NAME                                  STATUS   ROLES                 AGE  VERSION
tkg-ldaps-mgmt-control-plane-hr6nb    Ready    control-plane,master  11m  v1.20.5+vmware.1
tkg-ldaps-mgmt-md-0-54c99747c7-xhs6q  Ready    <none>                10m  v1.20.5+vmware.1


$ kubectl get apps -A

NAMESPACE   NAME                   DESCRIPTION           SINCE-DEPLOY AGE
tkg-system  antrea                 Reconcile succeeded    53s         7m6s
tkg-system  metrics-server         Reconcile succeeded    24s         7m6s
tkg-system  pinniped               Reconcile succeeded    28s         7m7s
tkg-system  tanzu-addons-manager   Reconcile succeeded   119s         11m
tkg-system  vsphere-cpi            Reconcile succeeded   109s         7m7s
tkg-system  vsphere-csi            Reconcile succeeded  5m33s         7m6s

Note that it is common to see some Pod failures on the management cluster for the pinniped-post-deploy-job. Once the Pinniped concierge deployment is online, an instance of this post deploy job should complete. If there are any Pinniped or Dex deployment failures, check the Pod logs as this might highlight an LDAP configuration issue.

TKG Workload Cluster deployment

We are now ready to create our first workload cluster. Here is the configuration file that I am using for this deployment.

#! -- See https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-tanzu-k8s-clusters-vsphere.html
##
##! ---------------------------------------------------------------------
##! Basic cluster creation configuration
##! ---------------------------------------------------------------------
##
CLUSTER_NAME: tkg-ldaps-wkld
CLUSTER_PLAN: prod
CNI: antrea
#
##! ---------------------------------------------------------------------
##! Node configuration
##! ---------------------------------------------------------------------
#
CONTROL_PLANE_MACHINE_COUNT: 1
WORKER_MACHINE_COUNT: 2
VSPHERE_CONTROL_PLANE_NUM_CPUS: 2
VSPHERE_CONTROL_PLANE_DISK_GIB: 40
VSPHERE_CONTROL_PLANE_MEM_MIB: 8192
VSPHERE_WORKER_NUM_CPUS: 2
VSPHERE_WORKER_DISK_GIB: 40
VSPHERE_WORKER_MEM_MIB: 4096
#
##! ---------------------------------------------------------------------
##! vSphere configuration
##! ---------------------------------------------------------------------
#
VSPHERE_DATACENTER: /OCTO-Datacenter
VSPHERE_DATASTORE: /OCTO-Datacenter/datastore/vsan-OCTO-Cluster-B
VSPHERE_FOLDER: /OCTO-Datacenter/vm/TKG
VSPHERE_NETWORK: VM-51-DVS-B
VSPHERE_PASSWORD: <encoded:Vk13YXJlMTIzIQ==>
VSPHERE_RESOURCE_POOL: /OCTO-Datacenter/host/OCTO-Cluster-B/Resources
VSPHERE_SERVER: vcsa-06.rainpole.com
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAA... chogan@chogan-a01.vmware.com
VSPHERE_TLS_THUMBPRINT: FA:A5:8A:...
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_CONTROL_PLANE_ENDPOINT: 10.27.51.238
#
#! ---------------------------------------------------------------------
#! Common configuration
#! ---------------------------------------------------------------------

ENABLE_DEFAULT_STORAGE_CLASS: true

CLUSTER_CIDR: 100.96.13.0/11
SERVICE_CIDR: 100.64.13.0/13

Again, there are lots of information about the contents of this file out there in the wild, so I am not going to spend any time explaining it. Hopefully by reading the file contents you will get a good idea now about the TKG workload cluster that this configuration will create. Let’s deploy the workload cluster, then retrieve the KUBECONFIG file so that we can interact with it.

$ tanzu cluster create --file ./workload_cluster.yaml
Validating configuration...
Creating workload cluster 'tkg-ldaps-wkld'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...
Waiting for addons installation...

Workload cluster 'tkg-ldaps-wkld' created


$ tanzu cluster list --include-management-cluster
 NAME            NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES       PLAN
 tkg-ldaps-wkld  default     running  1/1           2/2      v1.20.5+vmware.1  <none>      prod
 tkg-ldaps-mgmt  tkg-system  running  1/1           1/1      v1.20.5+vmware.1  management  dev


$ tanzu cluster kubeconfig get tkg-ldaps-wkld
ℹ You can now access the cluster by running 'kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld'


$ kubectl config get-contexts
CURRENT  NAME                                     CLUSTER         AUTHINFO                 NAMESPACE
         kubernetes-admin@kubernetes              kubernetes      kubernetes-admin
         tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld  tkg-ldaps-wkld  tanzu-cli-tkg-ldaps-wkld
*        tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt      tkg-ldaps-mgmt  tkg-ldaps-mgmt-admin


$ kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld
Switched to context "tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld".


$ kubectl config get-contexts
CURRENT  NAME                                     CLUSTER          AUTHINFO                 NAMESPACE
         kubernetes-admin@kubernetes              kubernetes       kubernetes-admin
*        tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld  tkg-ldaps-wkld   tanzu-cli-tkg-ldaps-wkld
         tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt      tkg-ldaps-mgmt   tkg-ldaps-mgmt-admin

Logging into AD Endpoint via Dex

If you are using a headless, non-graphical desktop, or you are SSH’ed into the desktop where you are running the kubectl commands, an attempt to query the nodes (or indeed any interaction with the cluster) at this point will produce a message similar to the following:

$ kubectl get nodes
Error: no DISPLAY environment variable specified
^C

What is happening here is that kubectl is calling a tanzu login to perform a federated login to the Identity Provider via Pinniped and Dex. This output confused me at first, until I realized that it was trying to launch a browser tab to prompt for AD/LDAP credentials. These are the credentials of the developer that wishes to be granted access to the workload cluster(s). This is why I mentioned in the requirements that you need to do this deployment on a desktop that has a GUI (at least, I am not aware of any way to provide these credentials at the command line). So when a kubectl is initiated, creating a tanzu login, Dex hosts a browser page which prompts for AD/LDAP credentials. At this point, the LDAP username of the person who will be interacting with the cluster, e.g. a developer, are added. Let’s suppose that this person is a developer with username chogan@rainpole.com. We would then add that username and password, and click on the login button.

Assuming the credentials are successful, and everything is working correctly, then we should see the following appear in the browser tab:

You can also provide developer access directly by running the following command, instead of launching it via a kubectl:

$ tanzu login --endpoint https://<MGMT Cluster API Server IP address>:6443 --name <MGMT Cluster Name>

You can get the API server IP address and the name of the management cluster from ~/.kube/config. This launches the browser tab as before, and the developer credentials are provided once again. This step will create a file called ~/.tanzu/pinniped/sessions.yaml for this developer/user. This has all of the information retrieved from the Identity Management system, in this case Active Directory. So far, so good. However, we are not finished yet, because if we try to query the workload cluster as the developer, they now face the following error:

$ kubectl get nodes
Error from server (Forbidden): nodes is forbidden: User "chogan@rainpole.com" cannot list resource "nodes" in API group "" at the cluster scope

OK – so now we have a bit of a chicken and egg situation. We need to create a ClusterRoleBinding for chogan@rainpole.com to give the developer access to the cluster, but the developer does not have permissions to interact with the cluster to create the role binding. This is a job for the cluster admin. The cluster admin gains admin permissions on the cluster by using the tanzu command with the –admin option to retrieve a new context with admin privileges.

$ tanzu cluster kubeconfig get tkg-ldaps-wkld --admin
Credentials of cluster 'tkg-ldaps-wkld' have been saved
You can now access the cluster by running 'kubectl config use-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld'


$ kubectl config use-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld
Switched to context "tkg-ldaps-wkld-admin@tkg-ldaps-wkld".


$ kubectl config get-contexts
CURRENT  NAME                                     CLUSTER         AUTHINFO                 NAMESPACE
         kubernetes-admin@kubernetes              kubernetes      kubernetes-admin
         tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld  tkg-ldaps-wkld  tanzu-cli-tkg-ldaps-wkld
         tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt      tkg-ldaps-mgmt  tkg-ldaps-mgmt-admin
*        tkg-ldaps-wkld-admin@tkg-ldaps-wkld      tkg-ldaps-wkld  tkg-ldaps-wkld-admin

We have now created a new context entry that has admin permissions on the workload cluster. The next step is to create a ClusterRoleBinding manifest for the user chogan@rainpole.com, and apply it to the cluster. In the example here, it is being created at a (kind:) user level. This could also be done at the (kind:) group level if there are multiple developers or users that need access, and they are all part of the same AD group. We have also provide a ClusterRole of cluster-admin in this case, but there are many other roles that can be assigned.

$ cat chogan-crb.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: chogan
subjects:
  - kind: User
    name: chogan@rainpole.com
    apiGroup:
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io


$ kubectl apply -f chogan-crb.yaml
clusterrolebinding.rbac.authorization.k8s.io/chogan created

So let’s change context back to the non-admin context, delete the admin context and see if the user chogan@rainpole.com can now query the cluster with the ClusterRoleBinding in place.

$ kubectl config use-context tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld
Switched to context "tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld".


$ kubectl config get-contexts
CURRENT NAME                                     CLUSTER         AUTHINFO                 NAMESPACE
        kubernetes-admin@kubernetes              kubernetes      kubernetes-admin
*       tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld  tkg-ldaps-wkld  tanzu-cli-tkg-ldaps-wkld
        tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt      tkg-ldaps-mgmt  tkg-ldaps-mgmt-admin
        tkg-ldaps-wkld-admin@tkg-ldaps-wkld      tkg-ldaps-wkld  tkg-ldaps-wkld-admin


$ kubectl config delete-context tkg-ldaps-wkld-admin@tkg-ldaps-wkld
deleted context tkg-ldaps-wkld-admin@tkg-ldaps-wkld from /home/cormac/.kube/config


$ kubectl config get-contexts
CURRENT NAME                                     CLUSTER         AUTHINFO                 NAMESPACE
        kubernetes-admin@kubernetes              kubernetes      kubernetes-admin
*       tanzu-cli-tkg-ldaps-wkld@tkg-ldaps-wkld  tkg-ldaps-wkld  tanzu-cli-tkg-ldaps-wkld
        tkg-ldaps-mgmt-admin@tkg-ldaps-mgmt      tkg-ldaps-mgmt  tkg-ldaps-mgmt-admin


$ kubectl get nodes
NAME                                  STATUS ROLES                 AGE  VERSION
tkg-ldaps-wkld-control-plane-skhm2    Ready  control-plane,master  38m  v1.20.5+vmware.1
tkg-ldaps-wkld-md-0-5d44ddfb98-7tlsg  Ready  <none>                36m  v1.20.5+vmware.1
tkg-ldaps-wkld-md-0-5d44ddfb98-tjk2v  Ready  <none>                36m  v1.20.5+vmware.1

Success! User/developer chogan@rainpole.com is now able to successfully interact with the TKG workload cluster after being authenticated via Pinniped and Dex to Active Directory/LDAP. Note that the tanzu login command (via Dex) only needs to be done once for a developer for all workload clusters. However, the ClusterRoleBinding would need to be done for each workload cluster, allowing the cluster admin to give the same developer different permissions on each cluster. If you examine the ~./kube/config you will notice that for the workload cluster(s), a tanzu login with Pinniped authentication is included in the KUBECONFIG context logic when the workload cluster is accessed.

Troubleshooting Tips

A few things to keep in mind when configuring LDAP with TKG.

IP address or FQDN for LDAP HOST

If you haven’t added an Subject Alternate Name (SAN) for the IP Address of the LDAP Host to your certificate, make sure you use the FQDN. If you don’t, when you try to connect via Dex, it will fail as follows:

Mangled LDAP Attributes

If you do not set the LDAP attributes such as OU, DC, CN correctly in the management cluster configuration, you may end up with a connection failure similar to this:

This is where the blog posts from Chris Little (NSX ALB) and Brian Ragazzi’s (LDAP settings) were a huge help.

Fat Fingered Credentials

This one was a little more obvious. If you don’t provide the correct LDAP_BIND_PASSWORD, you will see something like this when you try to authenticate.

I think a useful feature would be the ability to do a dry-run test of LDAP, or something similar, when we populate these fields in the UI but before we commit the configuration to the management cluster. I am feeding this back to the various teams responsible for this feature.

Dex Pod Failure

Of course, your deployment may not even get this far. It might be that you hit an issue with the Pinniped or Dex Pods failing. In this example, I didn’t populate all of the LDAP fields in the UI. I omitted LDAP_USER_SEARCH_USERNAME. Note that there is no validation check done to ensure all the fields are present and correct. Because of this, the Pinniped Post Deploy Job Pod did not complete. I checked the Pod logs which pointed me to a Dex issue. When I checked the Dex Pod’s logs and it told me that is was missing this required field.

$ kubectl logs pinniped-post-deploy-job-ckcgk -n pinniped-supervisor
2021-06-16T13:15:26.066Z INFO inspect/inspect.go:88 Getting TKG metadata...
2021-06-16T13:15:26.074Z INFO configure/configure.go:65 Readiness check for required resources
2021-06-16T13:15:26.086Z INFO configure/configure.go:102 The Pinniped concierge deployments are ready
2021-06-16T13:15:26.088Z INFO configure/configure.go:136 The Pinniped supervisor deployments are ready
2021-06-16T13:15:26.093Z INFO configure/configure.go:153 The Pinniped OIDCIdentityProvider pinniped-supervisor/upstream-oidc-identity-provider is ready
2021-06-16T13:15:58.428Z ERROR configure/configure.go:177 the Dex deployment is not ready, error: Dex deployment does not have enough ready replicas. 0/1 are ready
github.com/vmware-tanzu-private/core/addons/pinniped/post-deploy/pkg/configure.ensureResources
 /workspace/pkg/configure/configure.go:177
github.com/vmware-tanzu-private/core/addons/pinniped/post-deploy/pkg/configure.TKGAuthentication
 /workspace/pkg/configure/configure.go:197
main.main
 /workspace/main.go:103
runtime.main
 /usr/local/go/src/runtime/proc.go:203
2021-06-16T13:15:58.428Z ERROR workspace/main.go:111 Dex deployment does not have enough ready replicas. 0/1 are ready
main.main
 /workspace/main.go:111
runtime.main
 /usr/local/go/src/runtime/proc.go:203

$ kubectl logs dex-64884d69fc-mhxmj -n tanzu-system-auth
{"level":"info","msg":"config using log level: info","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config issuer: https://0.0.0.0:30167","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"kubernetes client apiVersion = dex.coreos.com/v1","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"creating custom Kubernetes resources","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource authcodes.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource authcodes.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource authrequests.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource authrequests.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource oauth2clients.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource oauth2clients.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource signingkeies.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource signingkeies.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource refreshtokens.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource refreshtokens.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource passwords.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource passwords.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource offlinesessionses.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource offlinesessionses.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource connectors.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The customresource connectors.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource devicerequests.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource devicerequests.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"checking if custom resource devicetokens.dex.coreos.com has been created already...","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"The custom resource devicetokens.dex.coreos.com already available, skipping create","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config storage: kubernetes","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config static client: pinniped","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config connector: ldap","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config response types accepted: [code]","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config skipping approval screen","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config signing keys expire after: 1h30m0s","time":"2021-06-16T13:15:46Z"}
{"level":"info","msg":"config id tokens valid for: 5m0s","time":"2021-06-16T13:15:46Z"}
failed to initialize server: server: Failed to open connector ldap: failed to open connector: \
failed to create connector ldap: ldap: missing required field "userSearch.username"

Those are just some tips and gotchas to be aware of. That completes the post. Hope you find it useful. If you have any further observations or suggestions, feel free to leave a comment.