TKG v1.4 LDAP (Active Directory) integration with Pinniped and Dex

LDAP integration with Pinniped and Dex is a topic that I have written about before, particularly with TKG v1.3. However, recently I had reason to deploy TKG v1.4 and noticed some nice new enhancements around LDAP integration that I thought it worthwhile highlighting. One is the fact that you no longer need to have a web browser available in the environment where you are configuring LDAP credentials which was a requirement is the previous version.

In this post, I will deploy a TKG v1.4 management cluster on vSphere. This environment uses the NSX ALB to provide IP addresses for both the TKG cluster control plane as well as Load Balancer services for applications. After deploying the TKG management cluster, the Pinniped and Dex services are converted from NodePort to Load Balancer. Then we will authenticate an LDAP user and assign a ClusterRoleBinding to that user so that they can work with the non-admin context of the management cluster. I will not cover how to deploy TKG with LDAP integration as this is covered in the previous post and the steps are identical.

Let’s begin by looking at the Pinniped and Dex services after initial management cluster deployment. As mentioned, both are configured to use NodePort services. We will need to change this to Load Balancer.

$ kubectl get all -n pinniped-supervisor
NAME                                      READY   STATUS      RESTARTS   AGE
pod/pinniped-post-deploy-job-789cj        0/1     Error       0          14h
pod/pinniped-post-deploy-job-c7jfc        0/1     Completed   0          14h
pod/pinniped-supervisor-ff8467c76-qt9kz   1/1     Running     0          14h
pod/pinniped-supervisor-ff8467c76-vjh44   1/1     Running     0          14h

NAME                          TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
service/pinniped-supervisor   NodePort   100.65.148.115   <none>        443:31234/TCP   14h

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pinniped-supervisor   2/2     2            2           14h

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/pinniped-supervisor-ff8467c76   2         2         2       14h

NAME                                 COMPLETIONS   DURATION   AGE
job.batch/pinniped-post-deploy-job   1/1           3m53s      14h

 
$ kubectl get all -n tanzu-system-auth
NAME                       READY   STATUS    RESTARTS   AGE
pod/dex-657fdcb9f9-bhtxf   1/1     Running   0          14h

NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/dexsvc   NodePort   100.67.123.98   <none>        5556:30167/TCP   14h

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dex   1/1     1            1           14h
 
NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/dex-657fdcb9f9   1         1         1       14h

Configure the Pinniped and Dex services to use Load Balancer services as per the documentation found here. Both Dex and Pinniped should then get external IP addresses provided by the NSX ALB.

$ cat pinniped-supervisor-svc-overlay.yaml
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "pinniped-supervisor", "namespace": "pinniped-supervisor"}})
---
#@overlay/replace
spec:
  type: LoadBalancer
  selector:
    app: pinniped-supervisor
  ports:
    - name: https
      protocol: TCP
      port: 443
      targetPort: 8443

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": Service", "metadata": {"name": "dexsvc", "namespace": "tanzu-system-auth"}}), missing_ok=True
---
#@overlay/replace
spec:
  type: LoadBalancer
  selector:
    app: dex
  ports:
    - name: dex
      protocol: TCP
      port: 443
      targetPort: https
 

$ cat pinniped-supervisor-svc-overlay.yaml | base64 -w 0
I0AgbG9hZCgiQHl0dDpvdmVybG.....wcwo=


$ kubectl patch secret ldap-cert-pinniped-addon -n tkg-system -p '{"data": {"overlays.yaml": "I0AgbG9hZCgiQHl0dDpvdmVyb...wcwo="}}'
secret/ldap-cert-pinniped-addon patched
 

$ kubectl get all -n pinniped-supervisor
NAME                                       READY   STATUS      RESTARTS   AGE
pod/pinniped-post-deploy-job-2x7hl         0/1     Completed   0          8m31s
pod/pinniped-post-deploy-job-r8hrh         0/1     Error       0          10m
pod/pinniped-post-deploy-job-wr7np         0/1     Error       0          9m19s
pod/pinniped-supervisor-5dcbd8d56f-g6qkd   1/1     Running     0          8m9s
pod/pinniped-supervisor-5dcbd8d56f-zhdfp   1/1     Running     0          8m9s

NAME                          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
service/pinniped-supervisor   LoadBalancer   100.71.206.210   xx.xx.xx.18   443:30044/TCP   10m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pinniped-supervisor   2/2     2            2           10m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/pinniped-supervisor-5dcbd8d56f   2         2         2       10m

NAME                                 COMPLETIONS   DURATION   AGE
job.batch/pinniped-post-deploy-job   1/1           2m34s      10m

 
$ kubectl get all -n tanzu-system-auth
NAME                       READY   STATUS    RESTARTS   AGE
pod/dex-688567f8c4-kf5mx   1/1     Running   0          8m29s

NAME             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
service/dexsvc   LoadBalancer   100.70.240.108   xx.xx.xx.19   443:31265/TCP   11m

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dex   1/1     1            1           11m

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/dex-688567f8c4   1         1         1       11m

The services are now of type Load Balancer and have external IP address (which I have intentionally obfuscated). The next step is to delete the pinniped post deploy job, and allow it restart. It will restart automatically. This can take some time (3-4 minutes). I usually run a watch (-w) to keep an eye on it.

$ kubectl get jobs -n pinniped-supervisor
NAME                       COMPLETIONS   DURATION   AGE
pinniped-post-deploy-job   1/1           3m53s      14h
 

$ kubectl delete jobs pinniped-post-deploy-job -n pinniped-supervisor
job.batch "pinniped-post-deploy-job" deleted


$ kubectl get jobs -n pinniped-supervisor -w
NAME                       COMPLETIONS   DURATION   AGE
pinniped-post-deploy-job   0/1                      0s
pinniped-post-deploy-job   0/1                      0s
pinniped-post-deploy-job   0/1           0s         0s
pinniped-post-deploy-job   1/1           11s        11s

You can now switch to the non-admin context of the cluster and try to access it. Now, if you are on a desktop and you try to do a query, a browser will be launched and you can provide the LDAP credentials for the user that you wish to authenticate and give TKG cluster access to. If however, like me, you are SSH’ed to a remote host and are running the kubectl get commands remotely in that session, you will see the following.

$ tanzu management-cluster kubeconfig get
You can now access the cluster by running 'kubectl config use-context tanzu-cli-ldap-cert@ldap-cert'
 
 
$ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert
Switched to context "tanzu-cli-ldap-cert@ldap-cert".
 
 
$ kubectl get nodes
Error: no DISPLAY environment variable specified
^C

This is working as expected. In the past, you would have required a desktop with a browser to do the authentication. In TKG v1.4, there is a new feature which allows you authenticate LDAP users on environments that do not have browsers (e.g. headless workstations) or when you are SSH’ed to an environment like I am above. This process is covered in the official documentation here. The first step is to set the environment variable TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER. You will then need to remove the earlier non-admin context, and recreate it with the environment variable in place.

$ export TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER=true


$ kubectl config delete-context tanzu-cli-ldap-cert@ldap-cert
warning: this removed your active context, use "kubectl config use-context" to select a different one
deleted context tanzu-cli-ldap-cert@ldap-cert from /home/cormac/.kube/config


$ tanzu management-cluster kubeconfig get
You can now access the cluster by running 'kubectl config use-context tanzu-cli-ldap-cert@ldap-cert'


$ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert
Switched to context "tanzu-cli-ldap-cert@ldap-cert".


$ kubectl get nodes
Please log in: https://xx.xx.xx.18/oauth2/authorize?access_type=offline&client_id=pinniped-cli&\
code_challenge=8VbRBUHKhSK69gP4mq3G1dnd897tu0ShpsDvkuqi1Q0&code_challenge_method=S256&nonce=\
263df470a232c24a929712565954a2d4&redirect_uri=http%3A%2F%2F127.0.0.1%3A39819%2Fcallback&\
response_type=code&scope=offline_access+openid+pinniped%3Arequest-audience&state=c2c0517fab9affa8741d78d32acdd330

Note the “Please log in” message. You can now copy this URL and paste it on any host that has a browser, but can still reach the Pinniped Supervisor service IP address. Notice that the redirect is a callback to 127.0.0.1 (localhost). This means that there will be a failure expected, as the host with the browser won’t respond to it, but that is ok. When you paste the link into a browser, you should see the following authenticate prompt where you an add the LDAP user which you want to have access to the non-admin context of your cluster:

The browser will redirect to the IP address of the Dex service, but when it tries a callback to localhost, you get this error:

Again, this is expected. The next step is to take the callback URL that was provided to you in the browser (the browser which was unable to do the 127.0.0.1 callback), and run the following command on the host where you ran the initial kubectl get command:

$ curl -L 'http://127.0.0.1:35259/callback?code=4bExu5_NESSuaY7kSCCIGcn8arVwJiVUC19UTZL-\
Jck.61FRn2BeHo7C8fYGbA0aranDsAFT3v0bRTTwq-TizA8&scope=openid+offline_access+pinniped%3A\
request-audience&state=edd3196e00a3eb01403d2d4e2d918fa3'
you have been logged in and may now close this tab

Anyway, commands now work as an LDAP user, providing the cluster role binding created as per the official documentation here matches the credentials added for the LDAP user. If the cluster role binding does not exist, you will encounter the following error in trying to query the non-admin cluster as a user who does not have any privileges:

$ kubectl get nodes
Error from server (Forbidden): nodes is forbidden: User "chogan@rainpole.com" cannot list resource \
"nodes" in API group "" at the cluster scope

To create it, switch back to the admin context, create the ClusterRoleBinding, switch again to the non-admin context and see if this LDAP user (who you have already authenticated via Dex/Pinniped) can now successfully interact with the cluster.

$ kubectl config use-context ldap-cert-admin@ldap-cert
Switched to context "ldap-cert-admin@ldap-cert".


$ kubectl config get-contexts
CURRENT   NAME                            CLUSTER      AUTHINFO                                   NAMESPACE
*         ldap-cert-admin@ldap-cert       ldap-cert    ldap-cert-admin
          tanzu-cli-ldap-cert@ldap-cert   ldap-cert    tanzu-cli-ldap-cert


$ cat chogan-crb.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: chogan
subjects:
  - kind: User
    name: chogan@rainpole.com
    apiGroup:
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
 

$ kubectl apply -f chogan-crb.yaml
clusterrolebinding.rbac.authorization.k8s.io/chogan created


$ kubectl get clusterrolebinding chogan
NAME     ROLE                        AGE
chogan   ClusterRole/cluster-admin   6m


$ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert
Switched to context "tanzu-cli-ldap-cert@ldap-cert".


$ kubectl get nodes
NAME                             STATUS   ROLES                  AGE   VERSION
ldap-cert-control-plane-77g8v    Ready    control-plane,master   86m   v1.21.2+vmware.1
ldap-cert-md-0-b7f799d64-kgcqm   Ready    <none>                 85m   v1.21.2+vmware.1

The LDAP user (chogan@rainpole.com) is now able to manage the cluster using the non-admin context of the TKG management cluster.

Useful tips

On one occasion, after deleting the Pinniped post deploy job, I could still not access the cluster. I got the following error:

$ kubectl get pods
Error: could not complete Pinniped login: could not perform OIDC discovery for \
"https://xx.xx.xx.16:31234": Get "https://xx.xx.xx.16:31234/.well-known/openid-configuration": \
dial tcp xx.xx.xx.16:31234: connect: connection refused
Error: pinniped-auth login failed: exit status 1
Error: exit status 1

Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1

To resolve this issue, I deleted the Pinniped post deploy job for a second time. Once that completed the kubectl command worked as expected. On another occasion, I hit the following issue:

$ kubectl get nodes
Error: could not complete Pinniped login: could not perform OIDC discovery for \
"https://xx.xx.xx.18": Get "https://xx.xx.xx.18/.well-known/openid-configuration": \
x509: certificate has expired or is not yet valid: current time 2021-11-18T10:52:35Z is before 2021-11-18T10:54:18Z
Error: pinniped-auth login failed: exit status 1
Error: exit status 1

Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1

To resolve this issue, I had to make some changes to the TKG control plane VM’s time sync configuration on vSphere. I did this by editing the settings of the control plane VMs in vSphere, and enabling the option to allow periodic time syncs to the ESXi host (this option is disabled by default). Afterwards, the kubectl command worked as expected. Here is where an admin can modify the Synchronize Time with Host. I simply need to click on the Synchronize time periodically and the issue was fixed.