![]()
Getting the VCF CLI
The VCF CLI tool which is available via the Supervisor API URL, accessible from the Summary tab > Status window of any Namespace in the vSphere Client. In the Link to CLI Tools, click Open.
This will take you to the VCF Consumption CLI. Here you can download the VCF CLI tools to match your desktop Operating System. Note that you will also need access to a kubectl command on your desktop to do any meaningful troubleshooting.
Access the K8s objects (including database) running on VKS
Now that we have access to the VCF CLI and kubectl, we can begin to login to both the Supervisor as well as the VKS cluster itself. It might be useful to access the Supervisor directly since this is where the DSM Consumption Operator exists. The DSM Consumption Operator extends the Supervisor API to include the DSM API, so from here we can query the state of the databases. Let’s look at how to create the Supervisor context using the VCF CLI, and then use kubectl to do further queries. The following command, which points to the Supervisor Control Plane Node Address, and uses basic auth with the vSphere administrator login and password, creates the context for the Supervisor as well as all of the existing namespaces on the Supervisor:
> vcf.exe context create --endpoint=192.168.20.6 --username administrator@sfo-w01.local --auth-type basic ? Provide a name for the context: cormac-sv ? Provide a name for the context: cormac-sv Provide Password: *********** Logged in successfully. You have access to the following contexts: cormac-sv cormac-sv:dsm-ns-lfyn4 cormac-sv:silver-tenant-ns-b8syh cormac-sv:svc-auto-attach-domain-c10 cormac-sv:svc-consumption-operator-domain-c10 cormac-sv:svc-tkg-domain-c10 cormac-sv:svc-velero-domain-c10 cormac-sv:vks-ns-2vm7n cormac-sv:vks-project-ns-74lj4 If the namespace context you wish to use is not in this list, you may need to refresh the context again, or contact your cluster administrator. To change context, use `vcf context use <context_name>` [ok] successfully created context: cormac-sv [ok] successfully created context: cormac-sv:svc-velero-domain-c10 [ok] successfully created context: cormac-sv:svc-tkg-domain-c10 [ok] successfully created context: cormac-sv:svc-auto-attach-domain-c10 [ok] successfully created context: cormac-sv:vks-project-ns-74lj4 [ok] successfully created context: cormac-sv:dsm-ns-lfyn4 [ok] successfully created context: cormac-sv:svc-consumption-operator-domain-c10 [ok] successfully created context: cormac-sv:silver-tenant-ns-b8syh [ok] successfully created context: cormac-sv:vks-ns-2vm7n
In this example, the namespace dsm-ns-lfyn4 is the namespace used for landing my DSM databases infrastructure (i.e., VKS cluster). Requests for databases can be made from any tenant namespace as long as they have been given permission to do so via the Data Service Policy in VCF Automation. In this case, a tenant in the silver-tenant-ns-b8syh namespace has requested a database to be created, and the VKS cluster backing this database has bee created in the dsm-ns-lfyn4 namespace, in accordance with the infrastructure policy. Let’s use the Supervisor context and query its nodes using kubectl.
> vcf.exe context use cormac-sv [ok] Token is still active. Skipped the token refresh for context "cormac-sv" [i] Successfully activated context 'cormac-sv' (Type: kubernetes) [i] Fetching recommended plugins for active context 'cormac-sv'... [i] Installing the following plugins recommended by context 'cormac-sv': NAME CURRENT INSTALLING cluster v3.3.1 v3.4.1 kubernetes-release v3.3.1 v3.4.1 package v3.3.1 v3.4.1 registry-secret v3.3.1 v3.4.1 [i] Installed plugin 'cluster:v3.4.1' [i] Installed plugin 'kubernetes-release:v3.4.1' [i] Installed plugin 'package:v3.4.1' [i] Installed plugin 'registry-secret:v3.4.1' [ok] Successfully installed all recommended plugins. > kubectl.exe get nodes NAME STATUS ROLES AGE VERSION 423b4f3d337d2f1bd0ee1538ce627aa3 Ready control-plane,master 4d2h v1.31.6+vmware.3-fips sfo01-w01-r01-esx01.sfo.rainpole.io Ready agent 4d1h v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf sfo01-w01-r01-esx02.sfo.rainpole.io Ready agent 4d1h v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf sfo01-w01-r01-esx03.sfo.rainpole.io Ready agent 4d1h v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf
This matches my Supervisor environment which has a single control plane node, and is built on a cluster with 3 ESXi hosts reported as agents. Looks good. Now lets check the DSM Consumption Operator is installed correctly on the Supervisor buy running some DSM specific database commands in this context using kubectl.
> kubectl.exe get postgresclusters -A NAMESPACE NAME STATUS STORAGE VERSION AGE silver-tenant-ns-b8syh silver-pg01 Ready 20Gi 17.5+vmware.v9.0.1.0 2d3h
It would appear that there has been a single Postgres database created so far. The name of the database is silver-pg01 and the request to create the database originated from the silver-tenant-ns-b8syh namespace. However, the infrastructure for the database is in a different namespace as mentioned. It is dsm-ns-lfyn4 which we can confirm by doing a describe on the database from the tenant namespace:
> kubectl describe postgresclusters silver-pg01 -n silver-tenant-ns-b8syh . . <--snip . Nodes: Datacenter: sfo-w01-DC Folder: Namespaces/cormac-sv/dsm-ns-lfyn4/silver-pg01-11706e Host: sfo01-w01-r01-esx03.sfo.rainpole.io Network: Devices: Network Name: silver-pg01-11706e-v69f5 Resource Pool: sfo-w01-cl01/Resources/Namespaces/dsm-ns-lfyn4/silver-pg01-11706e Server: sfo-w01-vc01.sfo.rainpole.io Storage Policy Name: vsan-default-storage-policy Vm Moid: vm-139 Vm Name: silver-pg01-11706e-r88pq-2dx24 Vm Role: ControlPlane
If we run a query for all of the VKS clusters across all namespaces, we can see that the name of the VKS cluster that backs our Postgres database is the same name as the database (silver-pg01). There are some other VKS clusters deployed in other databases, but these are not used for DSM. They are used for other workloads, which is quite normal to see.
> kubectl get clusters -A NAMESPACE NAME CLUSTERCLASS PHASE AGE VERSION dsm-ns-lfyn4 silver-pg01-11706e dsmclusterclass-9-0-1-0-24917825 Provisioned 2d23h v1.32.0+vmware.6-fips vks-ns-2vm7n kubernetes-cluster-i7qm builtin-generic-v3.4.0 Provisioned 2d19h v1.33.3+vmware.1-fips vks-project-ns-74lj4 kubernetes-cluster-jkj8 builtin-generic-v3.4.0 Provisioned 6d21h v1.33.3+vmware.1-fips
With the cluster name, you can look at the events associated with the VKS cluster by using the following kubectl command. This can be useful if there are some issues with the underlying cluster, which in turn prevents the database from coming online.
> kubectl events silver-pg01-11706e -n dsm-ns-lfyn4 LAST SEEN TYPE REASON OBJECT MESSAGE 2d17h (x135 over 3d1h) Normal UpdateSuccess VirtualMachine/silver-pg01-11706e-r88pq-2dx24 Update success 2d17h (x45 over 3d1h) Normal SuccessfulUpdate NetworkInfo/dsm-ns-lfyn4 NetworkInfo CR has been successfully updated 12m Normal KeyPairVerified Issuer/silver-pg01-11706e-extensions-ca-issuer Signing CA verified 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-8c018544-40ac-4156-913d-18d95a9e38a2 ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-monitor-0" and entity type "POD" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-2fc122fe-9b9e-4ff8-ab21-3a840a4a12a0 ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-monitor-silver-pg01-monitor-0" and entity type "PERSISTENT_VOLUME_CLAIM" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-3633c6d4-7e96-409b-8ab6-84d1fe7092e3 ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-pgdata-silver-pg01-0" and entity type "PERSISTENT_VOLUME_CLAIM" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-b4909cde-8139-43ca-8a42-0c2838253b8f ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-0" and entity type "POD" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-bf29d539-e02a-4ebc-8d66-3851d0e95428 ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "pvc-3633c6d4-7e96-409b-8ab6-84d1fe7092e3" and entity type "PERSISTENT_VOLUME" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 11m Normal UpdateSucceeded CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-c87be272-73aa-4632-a012-5a005ad9ee9c ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "pvc-2fc122fe-9b9e-4ff8-ab21-3a840a4a12a0" and entity type "PERSISTENT_VOLUME" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e". 10m Normal SuccessfulUpdate SubnetSet/vm-default SubnetSet CR has been successfully updated 9m59s Normal SuccessfulUpdate Service/silver-pg01-11706e LoadBalancer service has been successfully updated 9m59s Normal SuccessfulUpdate Service/silver-pg01-11706e-88836bd1a14b724dad415 LoadBalancer service has been successfully updated 9m51s Normal SuccessfulUpdate SubnetPort/silver-pg01-11706e-r88pq-2dx24-silver-pg01-11706e-v69f5-eth0 SubnetPort CR has been successfully updated 9m51s Normal SuccessfulUpdate SubnetSet/pod-default SubnetSet CR has been successfully updated 9m51s Normal SuccessfulUpdate SubnetSet/silver-pg01-11706e-v69f5 SubnetSet CR has been successfully updated 9m48s Normal SuccessfulUpdate Pod/jumpbox Pod CR has been successfully updated 9m30s Normal SuccessfulUpdate NetworkInfo/dsm-ns-lfyn4 NetworkInfo CR has been successfully updated 9m29s Normal SuccessfulRealizeNSXResource Service/silver-pg01-11706e-88836bd1a14b724dad415 Successful to update NSX resource for DLB Service 9m27s Normal SuccessfulRealizeNSXResource Service/silver-pg01-11706e Successful to process DLB endpoint resource 4m52s (x20 over 12m) Normal UpdateSuccess VirtualMachine/silver-pg01-11706e-r88pq-2dx24 Update success
With the cluster name, the VCF CLI can now be used to create a new context to access the VKS cluster backing the DSM database
Create a VKS context
The following commands create a new context for a VKS cluster (the one which backs the Postgres database called silver-pg01). To create this context, two additional parameters (workload-cluster-name and workload-cluster-namespace) must be included. Again, the auth-type is set to basic, but other authentication options are available (for details on how a tenant can use API tokens to access the VKS cluster, see this blog post from Tomas Fjota). Once the context is created, we can begin using this context and query the nodes and pods running in the cluster. Since this is a DSM provisioned single node database, it is normal to see a single control plane node provisioned. This node also allows DSM to create the Postgres pods on the control plane, something that is not allowed with vanilla VKS which always provisions a worker node for application workloads.
> vcf.exe context create --endpoint=192.168.20.6 --username administrator@sfo-w01.local --workload-cluster-name silver-pg01-11706e --workload-cluster-namespace dsm-ns-lfyn4 --auth-type basic ? Provide a name for the context: silver-pg01 ? Provide a name for the context: silver-pg01 Provide Password: ********* [i] Logging in to Kubernetes cluster (silver-pg01-11706e) (dsm-ns-lfyn4) [i] Successfully logged in to Kubernetes cluster 192.168.22.4 You have access to the following contexts: silver-pg01 silver-pg01:silver-pg01-11706e If the namespace context you wish to use is not in this list, you may need to refresh the context again, or contact your cluster administrator. To change context, use `vcf context use <context_name>` [ok] successfully created context: silver-pg01 [ok] successfully created context: silver-pg01:silver-pg01-11706e > vcf.exe context list NAME CURRENT TYPE cormac-sv true kubernetes cormac-sv:dsm-ns-lfyn4 false kubernetes cormac-sv:silver-tenant-ns-b8syh false kubernetes cormac-sv:svc-auto-attach-domain-c10 false kubernetes cormac-sv:svc-consumption-operator-domain-c10 false kubernetes cormac-sv:svc-tkg-domain-c10 false kubernetes cormac-sv:svc-velero-domain-c10 false kubernetes cormac-sv:vks-ns-2vm7n false kubernetes cormac-sv:vks-project-ns-74lj4 false kubernetes silver-pg01 false kubernetes silver-pg01:silver-pg01-11706e false kubernetes [i] Use '--wide' to view additional columns. > vcf context use silver-pg01:silver-pg01-11706e [ok] Token is still active. Skipped the token refresh for context "silver-pg01:silver-pg01-11706e" [i] Successfully activated context 'silver-pg01:silver-pg01-11706e' (Type: kubernetes) [i] Fetching recommended plugins for active context 'silver-pg01:silver-pg01-11706e'... [ok] No recommended plugins found. > vcf.exe context list NAME CURRENT TYPE cormac-sv false kubernetes cormac-sv:dsm-ns-lfyn4 false kubernetes cormac-sv:silver-tenant-ns-b8syh false kubernetes cormac-sv:svc-auto-attach-domain-c10 false kubernetes cormac-sv:svc-consumption-operator-domain-c10 false kubernetes cormac-sv:svc-tkg-domain-c10 false kubernetes cormac-sv:svc-velero-domain-c10 false kubernetes cormac-sv:vks-ns-2vm7n false kubernetes cormac-sv:vks-project-ns-74lj4 false kubernetes silver-pg01 false kubernetes silver-pg01:silver-pg01-11706e true kubernetes [i] Use '--wide' to view additional columns. > kubectl get nodes NAME STATUS ROLES AGE VERSION silver-pg01-11706e-r88pq-2dx24 Ready control-plane 2d23h v1.32.0+vmware.6-fips > kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-5668b6499f-t29zf 1/1 Running 2 (31m ago) 2d23h cert-manager cert-manager-cainjector-859df965db-5zf4p 1/1 Running 2 (31m ago) 2d23h cert-manager cert-manager-webhook-7d66fb4668-bg9s8 1/1 Running 1 (31m ago) 2d23h d14a47-silver-tenant-ns-b8syh default-incremental-backup-29422079-v8rmz 0/1 Completed 0 2d11h d14a47-silver-tenant-ns-b8syh default-incremental-backup-29423519-fhsmj 0/1 Completed 0 35h d14a47-silver-tenant-ns-b8syh default-incremental-backup-29424959-vl5wm 0/1 Completed 0 31m d14a47-silver-tenant-ns-b8syh silver-pg01-0 4/4 Running 4 (31m ago) 2d23h d14a47-silver-tenant-ns-b8syh silver-pg01-monitor-0 4/4 Running 4 (31m ago) 2d23h kube-system antrea-agent-zmk8v 2/2 Running 5 (29m ago) 2d23h kube-system antrea-controller-f5d6d787f-m2bbn 1/1 Running 4 (29m ago) 2d23h kube-system coredns-57db7b44f5-csfjm 1/1 Running 1 (31m ago) 2d23h kube-system coredns-69b565fcb5-9bswf 0/1 Pending 0 2d23h kube-system docker-registry-silver-pg01-11706e-r88pq-2dx24 1/1 Running 1 (31m ago) 2d23h kube-system etcd-silver-pg01-11706e-r88pq-2dx24 1/1 Running 1 (31m ago) 2d23h kube-system image-puller-4jltj 1/1 Running 5 (31m ago) 2d23h kube-system kube-apiserver-silver-pg01-11706e-r88pq-2dx24 1/1 Running 1 (31m ago) 2d23h kube-system kube-controller-manager-silver-pg01-11706e-r88pq-2dx24 1/1 Running 3 (31m ago) 2d23h kube-system kube-proxy-cts9x 1/1 Running 1 (31m ago) 2d23h kube-system kube-scheduler-silver-pg01-11706e-r88pq-2dx24 1/1 Running 3 (31m ago) 2d23h kube-system metrics-server-6ccf55cf87-4jbzt 1/1 Running 1 (31m ago) 2d23h kube-system snapshot-controller-7ccbcfddfd-4czzc 1/1 Running 1 (31m ago) 2d23h pinniped-concierge pinniped-concierge-77ccbc897d-75l2z 1/1 Running 1 (31m ago) 2d23h pinniped-concierge pinniped-concierge-77ccbc897d-ww9fl 1/1 Running 1 (31m ago) 2d23h pinniped-concierge pinniped-concierge-kube-cert-agent-7449f8dbbb-vjt5b 1/1 Running 1 (31m ago) 2d23h secretgen-controller secretgen-controller-5cbf99f6c-t9bdn 1/1 Running 1 (31m ago) 2d23h telegraf telegraf-6d994786d8-cn82f 1/1 Running 1 (31m ago) 2d23h tkg-system kapp-controller-7ff74d9865-4989q 2/2 Running 5 (29m ago) 2d23h vmware-sql-postgres postgres-operator-56ff7f7679-dd8vl 1/1 Running 1 (31m ago) 2d23h vmware-system-antrea antrea-pre-upgrade-job-dsknp 0/1 Completed 0 2d23h vmware-system-auth guest-cluster-auth-svc-lgv4x 1/1 Running 1 (31m ago) 2d23h vmware-system-cloud-provider guest-cluster-cloud-provider-67f87c6699-qnhqd 1/1 Running 6 (29m ago) 2d23h vmware-system-csi vsphere-csi-controller-854ffbff6-w9lbf 7/7 Running 7 (31m ago) 2d23h vmware-system-csi vsphere-csi-node-jp6jn 3/3 Running 13 (29m ago) 2d23h
In this VKS context, in the namespace d14a47-silver-tenant-ns-b8syh, I can see the pod for the primary database (silver-pg01-0) as well as the pod for the monitor (silver-pg01-monitor-0) which I’ve highlighted in blue above. Note that the primary pod has 4 containers called pg-container, instance-logging, reconfigure-instance and postgres-sidecar. Using kubectl, I can describe the pod, get events and look at the container logs, e.g., if you want to look at the pg-container log, run the following command (-c for container):
> kubectl logs silver-pg01-0 -c pg-container -n d14a47-silver-tenant-ns-b8syh | more 2025-12-12T10:33:02.722Z INFO postgresinstance Removing post start tasks executed 2025-12-12T10:33:02.731Z INFO postgresinstance Removed post start tasks executed 2025-12-12T10:33:02.731Z INFO postgresinstance Running pre-start tasks 2025-12-12T10:33:02.731Z INFO postgresinstance executing {"task": "ConfigureDirectoryPermission"} 2025-12-12T10:33:02.731Z INFO postgresinstance executing {"task": "ConfigurePassFileTask"} 2025-12-12T10:33:02.732Z INFO postgresinstance executing {"task": "WaitForMonitorTask"} 2025-12-12T10:33:02.825Z INFO postgresinstance failed to connect to `user=autoctl_node database=pg_auto_failover`: hostname resolving error: lookup silver-pg01-monitor-0.silver-pg01-agent.d14a47-silver-tenant-ns-b8syh.svc.cluster.local on 10.96.0.10:53: no such host 2025-12-12T10:33:07.844Z INFO postgresinstance failed to connect to `user=autoctl_node database=pg_auto_failover`: 192.168.0.16:5432 (silver-pg01-monitor-0.silver-pg01-agent.d14a47-silver-tenant-ns-b8syh.svc.cluster.local): dial error: dial tcp 192.168.0.16:5432: connect: connection refused 2025-12-12T10:33:12.898Z INFO postgresinstance Connected to monitor 2025-12-12T10:33:12.898Z INFO postgresinstance executing {"task": "CleanUpDatabaseProcessTask"} 2025-12-12T10:33:12.898Z INFO postgresinstance Start cleanup process database... pg_ctl: could not send stop signal (PID: 309): No such process 2025-12-12T10:33:12.901Z INFO postgresinstance executing {"task": "RemovePostmasterPidTask"} 2025-12-12T10:33:12.902Z INFO postgresinstance executing {"task": "ClearCustomTempDirTask"} 2025-12-12T10:33:12.903Z INFO postgresinstance executing {"task": "WriteCustomConfigFileTask"} 2025-12-12T10:33:12.923Z INFO postgresinstance Applying custom config {"config": {"Mode":"verify-ca","CAFilePath":"/etc/postgres_ssl/ca.crt","CertFilePath":"/etc/postgres_ssl/tls.crt","KeyFilePath":"/etc/postgres_ssl/tls.key","CustomConfigFilePath":"/pgsql/custom/postgresql-custom-override.conf","IsArchiveModeEnabled":true,"UnixSocketDirectories":["/pgsql/custom/tmp","/tmp"],"ArchiveCommand":"pgbackrest --stanza=d14a47-silver-tenant-ns-b8syh-silver-pg01-293d8da7-4489-4a93-a5c4-7a949b2960d4 archive-push %p","SharedPreloadLibraries":["pg_stat_statements","pgaudit","pg_cron"],"PostgresVersion":"17","PostgresLogDirectory":"/pgsql/logs/postgres","UserProvidedCustomPostgresConfigPath":"/etc/customconfig/postgresql.conf","BackupBasedContinuousRestoreMode":false,"SharedBuffers":"2654 MB","WorkMem":"26 MB","WalKeepSize":"96 MB","WalKeepSegments":6,"MaintenanceWorkMem":"530 MB","EffectiveCacheSize":"5309 MB","MaxSlotWalKeepSize":"1998 MB"}} 2025-12-12T10:33:12.924Z INFO postgresinstance executing {"task": "ConfigureMonitorConnectionStringTask"} 2025-12-12T10:33:12.939Z INFO postgresinstance executing {"task": "ConfigureSSLTask"} 2025-12-12T10:33:12.947Z INFO postgresinstance ssl.ca_file is already set to /etc/postgres_ssl/ca.crt 2025-12-12T10:33:12.954Z INFO postgresinstance ssl.cert_file is already set to /etc/postgres_ssl/tls.crt 2025-12-12T10:33:12.962Z INFO postgresinstance ssl.key_file is already set to /etc/postgres_ssl/tls.key 2025-12-12T10:33:12.962Z INFO postgresinstance executing {"task": "InitializeDatabaseTask"} 2025-12-12T10:33:12.962Z INFO postgresinstance Start initializing database...
So as you can clearly see, some low-level troubleshooting can be done using the new VCF CLI when a DSM database is provisioned via VCF Automation, using a Supervisor namespace infrastructure policy and is there fore using the vSphere Kubernetes Service to host the database.
Note: If you are running VCF v9.0.1 with DSM integrated, and you run the above commands and notice that there are no running containers (0/4) in the database pod, then you may have encountered an issue with the VKS Management service consuming most of the available resources in the control plane nodes of the VKS cluster. The VKS Management component automatically adds a bunch of agents to the control plane nodes of a VKS cluster, resulting in not enough resources to run the database. If you think this might be it, this is the KB which describes the workaround of adding the database namespace to the VKSM configMap – https://knowledge.broadcom.
SSH access to the VKS node
Now the final part of this section is to show you how to ssh onto the VKS node that hosts the database. Caution: With great power comes great responsibility. I would urge you to use extreme care if logging onto the VKS node, as you may end up doing something that impacts the database. However, there may be valid reasons where you need to do this, perhaps checking networking connectivity, etc. So, as per the official documentation, here is how to get ssh access to a VKS node using a jumpbox PodVM that has been deployed onto the same Namespace as the VKS cluster.
The first step is to switch contexts once more, and go back to the Supervisor context, cormac-sv. From here, you will need to list the Kubernetes “secrets” in the namespace where the database infra has been provisioned, in this case the dsm-ns-lfyn4. This ssh secret contains a private key. With this information, you will be able to ssh onto the VKS node as a system user (vmware-system-user). The secret we are interested in is highlighted in blue below.
> vcf.exe context use cormac-sv [ok] Token is still active. Skipped the token refresh for context "sv" [i] Successfully activated context 'sv' (Type: kubernetes) [i] Fetching recommended plugins for active context 'sv'... [ok] All recommended plugins are already installed and up-to-date. > kubectl.exe get secrets -n dsm-ns-lfyn4 NAME TYPE DATA AGE cluster-autoscaler-secret kubernetes.io/service-account-token 3 3d silver-pg01-11706e-antrea-data-values Opaque 1 3d silver-pg01-11706e-auth-svc-cert kubernetes.io/tls 3 3d silver-pg01-11706e-ca cluster.x-k8s.io/secret 2 3d silver-pg01-11706e-control-plane-machine-agent-conf Opaque 1 3d silver-pg01-11706e-encryption Opaque 1 3d silver-pg01-11706e-encryption-config Opaque 1 3d silver-pg01-11706e-etcd cluster.x-k8s.io/secret 2 3d silver-pg01-11706e-extensions-ca kubernetes.io/tls 3 3d silver-pg01-11706e-gateway-api-package clusterbootstrap-secret 0 3d silver-pg01-11706e-guest-cluster-auth-service-data-values Opaque 1 3d silver-pg01-11706e-kapp-controller-data-values Opaque 2 3d silver-pg01-11706e-kubeconfig cluster.x-k8s.io/secret 1 3d silver-pg01-11706e-ma-token Opaque 2 3d silver-pg01-11706e-metrics-server-package clusterbootstrap-secret 0 3d silver-pg01-11706e-pinniped-package clusterbootstrap-secret 1 3d silver-pg01-11706e-proxy cluster.x-k8s.io/secret 2 3d silver-pg01-11706e-r88pq-2dx24 cluster.x-k8s.io/secret 2 3d silver-pg01-11706e-sa cluster.x-k8s.io/secret 2 3d silver-pg01-11706e-secretgen-controller-package clusterbootstrap-secret 1 3d silver-pg01-11706e-ssh kubernetes.io/ssh-auth 1 3d silver-pg01-11706e-ssh-password Opaque 1 3d silver-pg01-11706e-ssh-password-hashed Opaque 1 3d silver-pg01-11706e-user-trusted-ca-secret Opaque 1 3d silver-pg01-11706e-v69f5-ccm-secret kubernetes.io/service-account-token 3 3d silver-pg01-11706e-v69f5-pvbackupdriver-secret kubernetes.io/service-account-token 3 3d silver-pg01-11706e-v69f5-pvcsi-secret kubernetes.io/service-account-token 3 3d silver-pg01-11706e-vsphere-cpi-data-values Opaque 1 3d silver-pg01-11706e-vsphere-pv-csi-data-values Opaque 1 3d
The next step is to create a YAML manifest which describes the PodVM that we wish to deploy. Remember that this PodVM is deployed in the same namespace (dsm-ns-lfyn4) as the VKS cluster running the Postgres database pods. Here is an example of such as PodVM. It uses a Photon OS image and passes a command that copies the private key from the volume created from the secret which we referenced earlier. The secretName is highlighted in blue below once more. The volume holding the private key is mounted onto /root/ssh. We use yum to install openssh. The private key is then copied to a file called /root/.ssh/id_rsa. This now allows an ssh session as the ‘vmware-system-user’ onto the VKS node from the jumpbox PodVM. We pass the ssh command as an argument when we exec to the Pod (we will see how to do this shortly).
apiVersion: v1 kind: Pod metadata: name: jumpbox namespace: dsm-ns-lfyn4 spec: containers: - image: "photon:5.0" name: jumpbox command: [ "/bin/bash", "-c", "--" ] args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ] volumeMounts: - mountPath: "/root/ssh" name: ssh-key readOnly: true resources: requests: memory: 2Gi volumes: - name: ssh-key secret: secretName: silver-pg01-11706e-ssh imagePullSecrets: - name: regcred
Next, apply the manifest to create the PodVM, and ensure it is running.
> kubectl.exe apply -f jumpbox-dsm.yml pod/jumpbox created > kubectl get pods -n dsm-ns-lfyn4 NAMESPACE NAME READY STATUS RESTARTS AGE dsm-ns-lfyn4 jumpbox 1/1 Running 0 3m37s
Now, determine the IP address of the node that we wish to ssh onto. The following command, which queries virtual machines in the namespace, will provide this information.
> kubectl.exe get vm -o wide -n dsm-ns-lfyn4 NAME POWER-STATE CLASS IMAGE PRIMARY-IP4 AGE silver-pg01-11706e-r88pq-2dx24 PoweredOn best-effort-large vmi-24554b66363a299c5 192.173.237.3 2d3h
The ssh session can now be initiated. Run a ‘kubectl exec’ command as show below. As soon as you ‘exec’ onto the jumpbox PodVM, the ssh command is run to the VKS node’s IP address is run. The PodVM will run the command and args as defined in the YAML manifest above to configure the private key for the ‘vmware-system-user’. This then allows the ssh command passed to the ‘kubectl exec’ onto the VKS node to succeed.
> kubectl exec -it jumpbox -n dsm-ns-lfyn4 -- /usr/bin/ssh vmware-system-user@192.173.237.3 The authenticity of host '192.173.237.3 (192.173.237.3)' can't be established. ED25519 key fingerprint is SHA256:HEu9cZ9Uq4EFwQvR1KiWbJWlR0Jd3SXNK7lu8C6WBnY. This key is not known by any other names. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '192.173.237.3' (ED25519) to the list of known hosts. cat: /var/run/motdgen/motd: Permission denied vmware-system-user@silver-pg01-11706e-r88pq-2dx24 [ ~ ]$ ip a | more 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000 link/ether 04:50:56:00:78:00 brd ff:ff:ff:ff:ff:ff altname eno1 altname enp11s0 altname ens192 inet 192.173.237.3/27 brd 192.173.237.31 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::650:56ff:fe00:7800/64 scope link valid_lft forever preferred_lft forever . . .
And that completes the step to gain ssh access onto a VKS node that is backing a DSM provisioned database. Refer to the official documentation linked earlier for other methods (such as password) to gain access. If you need to lock down ssh access the the database, you can of course create an NSX Firewall Rule and block this port. There is a simple example of how to do this available here.
Summary
That concludes the post. Hopefully you have seen some of the ways in which it is possible to troubleshoot DSM database deployments on vSphere Kubernetes Services clusters. As mentioned, this is the default Kubernetes used when the infrastructure policy for the database is a Supervisor Namespace, typically provisioned via VCF Automation in VCF 9.x.
