Site icon CormacHogan.com

Using the VCF 9.x CLI to troubleshoot a DSM database running on VKS

As many readers are now aware, databases provisioned from DSM via VCF Automation have the option to create a vSphere Kubernetes Service cluster to host the database. The decision to use a VKS cluster or whether to use DSM’s own Kubernetes cluster is based on the Infrastructure Policy. If the Infrastructure Policy is built on traditional vSphere resources, then DSM’s own K8s is used. If the Infrastructure Policy points to a Supervisor Namespace, then VKS is used. In this post, I wanted to provide some tips and tricks on accessing and troubleshooting the DSM database and VKS cluster backing the database using the new VCF command line tool available from the Supervisor in VCF 9.x.

Getting the VCF CLI

The VCF CLI tool which is available via the Supervisor API URL, accessible from the Summary tab > Status window of any Namespace in the vSphere Client. In the Link to CLI Tools, click Open.

This will take you to the VCF Consumption CLI. Here you can download the VCF CLI tools to match your desktop Operating System. Note that you will also need access to a kubectl command on your desktop to do any meaningful troubleshooting.

Access the K8s objects (including database) running on VKS

Now that we have access to the VCF CLI and kubectl, we can begin to login to both the Supervisor as well as the VKS cluster itself. It might be useful to access the Supervisor directly since this is where the DSM Consumption Operator exists. The DSM Consumption Operator extends the Supervisor API to include the DSM API, so from here we can query the state of the databases. Let’s look at how to create the Supervisor context using the VCF CLI, and then use kubectl to do further queries. The following command, which points to the Supervisor Control Plane Node Address, and uses basic auth with the vSphere administrator login and password, creates the context for the Supervisor as well as all of the existing namespaces on the Supervisor:

> vcf.exe context create --endpoint=192.168.20.6  --username administrator@sfo-w01.local --auth-type basic
? Provide a name for the context: cormac-sv

? Provide a name for the context: cormac-sv
Provide Password: ***********

Logged in successfully.

You have access to the following contexts:
cormac-sv
cormac-sv:dsm-ns-lfyn4
cormac-sv:silver-tenant-ns-b8syh
cormac-sv:svc-auto-attach-domain-c10
cormac-sv:svc-consumption-operator-domain-c10
cormac-sv:svc-tkg-domain-c10
cormac-sv:svc-velero-domain-c10
cormac-sv:vks-ns-2vm7n
cormac-sv:vks-project-ns-74lj4

If the namespace context you wish to use is not in this list, you may need to
refresh the context again, or contact your cluster administrator.

To change context, use `vcf context use <context_name>`
[ok] successfully created context: cormac-sv
[ok] successfully created context: cormac-sv:svc-velero-domain-c10
[ok] successfully created context: cormac-sv:svc-tkg-domain-c10
[ok] successfully created context: cormac-sv:svc-auto-attach-domain-c10
[ok] successfully created context: cormac-sv:vks-project-ns-74lj4
[ok] successfully created context: cormac-sv:dsm-ns-lfyn4
[ok] successfully created context: cormac-sv:svc-consumption-operator-domain-c10
[ok] successfully created context: cormac-sv:silver-tenant-ns-b8syh
[ok] successfully created context: cormac-sv:vks-ns-2vm7n

In this example, the namespace dsm-ns-lfyn4 is the namespace used for landing my DSM databases infrastructure (i.e., VKS cluster). Requests for databases can be made from any tenant namespace as long as they have been given permission to do so via the Data Service Policy in VCF Automation. In this case, a tenant in the silver-tenant-ns-b8syh namespace has requested a database to be created, and the VKS cluster backing this database has bee created in the dsm-ns-lfyn4 namespace, in accordance with the infrastructure policy. Let’s use the Supervisor context and query its nodes using kubectl.

> vcf.exe context use cormac-sv
[ok] Token is still active. Skipped the token refresh for context "cormac-sv"
[i] Successfully activated context 'cormac-sv' (Type: kubernetes)
[i] Fetching recommended plugins for active context 'cormac-sv'...
[i] Installing the following plugins recommended by context 'cormac-sv':
  NAME                CURRENT  INSTALLING
  cluster             v3.3.1   v3.4.1
  kubernetes-release  v3.3.1   v3.4.1
  package             v3.3.1   v3.4.1
  registry-secret     v3.3.1   v3.4.1
[i] Installed plugin 'cluster:v3.4.1'
[i] Installed plugin 'kubernetes-release:v3.4.1'
[i] Installed plugin 'package:v3.4.1'
[i] Installed plugin 'registry-secret:v3.4.1'
[ok] Successfully installed all recommended plugins.

> kubectl.exe get nodes
NAME                                  STATUS   ROLES                  AGE    VERSION
423b4f3d337d2f1bd0ee1538ce627aa3      Ready    control-plane,master   4d2h   v1.31.6+vmware.3-fips
sfo01-w01-r01-esx01.sfo.rainpole.io   Ready    agent                  4d1h   v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf
sfo01-w01-r01-esx02.sfo.rainpole.io   Ready    agent                  4d1h   v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf
sfo01-w01-r01-esx03.sfo.rainpole.io   Ready    agent                  4d1h   v1.31.6-sph-vmware-clustered-infravisor-trunk-85-g71ed1bf

This matches my Supervisor environment which has a single control plane node, and is built on a cluster with 3 ESXi hosts reported as agents. Looks good. Now lets check the DSM Consumption Operator is installed correctly on the Supervisor buy running some DSM specific database commands in this context using kubectl.

> kubectl.exe get postgresclusters -A
NAMESPACE                NAME          STATUS   STORAGE   VERSION                AGE
silver-tenant-ns-b8syh   silver-pg01   Ready    20Gi      17.5+vmware.v9.0.1.0   2d3h

It would appear that there has been a single Postgres database created so far. The name of the database is silver-pg01 and the request to create the database originated from the silver-tenant-ns-b8syh namespace. However, the infrastructure for the database is in a different namespace as mentioned. It is dsm-ns-lfyn4 which we can confirm by doing a describe on the database from the tenant namespace:

> kubectl describe postgresclusters silver-pg01 -n silver-tenant-ns-b8syh
.
. <--snip
.
 Nodes:
Datacenter: sfo-w01-DC
Folder: Namespaces/cormac-sv/dsm-ns-lfyn4/silver-pg01-11706e
Host: sfo01-w01-r01-esx03.sfo.rainpole.io
Network:
Devices:
Network Name: silver-pg01-11706e-v69f5
Resource Pool: sfo-w01-cl01/Resources/Namespaces/dsm-ns-lfyn4/silver-pg01-11706e
Server: sfo-w01-vc01.sfo.rainpole.io
Storage Policy Name: vsan-default-storage-policy
Vm Moid: vm-139
Vm Name: silver-pg01-11706e-r88pq-2dx24
Vm Role: ControlPlane

If we run a query for all of the VKS clusters across all namespaces, we can see that the name of the VKS cluster that backs our Postgres database is the same name as the database (silver-pg01). There are some other VKS clusters deployed in other databases, but these are not used for DSM. They are used for other workloads, which is quite normal to see.

> kubectl get clusters -A
NAMESPACE              NAME                      CLUSTERCLASS                       PHASE         AGE     VERSION
dsm-ns-lfyn4           silver-pg01-11706e        dsmclusterclass-9-0-1-0-24917825   Provisioned   2d23h   v1.32.0+vmware.6-fips
vks-ns-2vm7n           kubernetes-cluster-i7qm   builtin-generic-v3.4.0             Provisioned   2d19h   v1.33.3+vmware.1-fips
vks-project-ns-74lj4   kubernetes-cluster-jkj8   builtin-generic-v3.4.0             Provisioned   6d21h   v1.33.3+vmware.1-fips

With the cluster name, you can look at the events associated with the VKS cluster by using the following kubectl command. This can be useful if there are some issues with the underlying cluster, which in turn prevents the database from coming online.

> kubectl events silver-pg01-11706e -n dsm-ns-lfyn4
LAST SEEN                 TYPE      REASON                         OBJECT                                                                                        MESSAGE
2d17h (x135 over 3d1h)    Normal    UpdateSuccess                  VirtualMachine/silver-pg01-11706e-r88pq-2dx24                                                 Update success
2d17h (x45 over 3d1h)     Normal    SuccessfulUpdate               NetworkInfo/dsm-ns-lfyn4                                                                      NetworkInfo CR has been successfully updated
12m                       Normal    KeyPairVerified                Issuer/silver-pg01-11706e-extensions-ca-issuer                                                Signing CA verified
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-8c018544-40ac-4156-913d-18d95a9e38a2   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-monitor-0" and entity type "POD" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-2fc122fe-9b9e-4ff8-ab21-3a840a4a12a0   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-monitor-silver-pg01-monitor-0" and entity type "PERSISTENT_VOLUME_CLAIM" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-3633c6d4-7e96-409b-8ab6-84d1fe7092e3   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-pgdata-silver-pg01-0" and entity type "PERSISTENT_VOLUME_CLAIM" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-b4909cde-8139-43ca-8a42-0c2838253b8f   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "silver-pg01-0" and entity type "POD" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-bf29d539-e02a-4ebc-8d66-3851d0e95428   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "pvc-3633c6d4-7e96-409b-8ab6-84d1fe7092e3" and entity type "PERSISTENT_VOLUME" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
11m                       Normal    UpdateSucceeded                CnsVolumeMetadata/a8bc3192-7805-46ee-ba2b-f0e5d144c42e-c87be272-73aa-4632-a012-5a005ad9ee9c   ReconcileCnsVolumeMetadata: Successfully updated entry in CNS for instance with name "pvc-2fc122fe-9b9e-4ff8-ab21-3a840a4a12a0" and entity type "PERSISTENT_VOLUME" in the guest cluster "a8bc3192-7805-46ee-ba2b-f0e5d144c42e".
10m                       Normal    SuccessfulUpdate               SubnetSet/vm-default                                                                          SubnetSet CR has been successfully updated
9m59s                     Normal    SuccessfulUpdate               Service/silver-pg01-11706e                                                                    LoadBalancer service has been successfully updated
9m59s                     Normal    SuccessfulUpdate               Service/silver-pg01-11706e-88836bd1a14b724dad415                                              LoadBalancer service has been successfully updated
9m51s                     Normal    SuccessfulUpdate               SubnetPort/silver-pg01-11706e-r88pq-2dx24-silver-pg01-11706e-v69f5-eth0                       SubnetPort CR has been successfully updated
9m51s                     Normal    SuccessfulUpdate               SubnetSet/pod-default                                                                         SubnetSet CR has been successfully updated
9m51s                     Normal    SuccessfulUpdate               SubnetSet/silver-pg01-11706e-v69f5                                                            SubnetSet CR has been successfully updated
9m48s                     Normal    SuccessfulUpdate               Pod/jumpbox                                                                                   Pod CR has been successfully updated
9m30s                     Normal    SuccessfulUpdate               NetworkInfo/dsm-ns-lfyn4                                                                      NetworkInfo CR has been successfully updated
9m29s                     Normal    SuccessfulRealizeNSXResource   Service/silver-pg01-11706e-88836bd1a14b724dad415                                              Successful to update NSX resource for DLB Service
9m27s                     Normal    SuccessfulRealizeNSXResource   Service/silver-pg01-11706e                                                                    Successful to process DLB endpoint resource
4m52s (x20 over 12m)      Normal    UpdateSuccess                  VirtualMachine/silver-pg01-11706e-r88pq-2dx24                                                 Update success

With the cluster name, the VCF CLI can now be used to create a new context to access the VKS cluster backing the DSM database

Create a VKS context

The following commands create a new context for a VKS cluster (the one which backs the Postgres database called silver-pg01). To create this context, two additional parameters (workload-cluster-name and workload-cluster-namespace) must be included. Again, the auth-type is set to basic, but other authentication options are available (for details on how a tenant can use API tokens to access the VKS cluster, see this blog post from Tomas Fjota). Once the context is created, we can begin using this context and query the nodes and pods running in the cluster. Since this is a DSM provisioned single node database, it is normal to see a single control plane node provisioned. This node also allows DSM to create the Postgres pods on the control plane, something that is not allowed with vanilla VKS which always provisions a worker node for application workloads.

> vcf.exe context create --endpoint=192.168.20.6 --username administrator@sfo-w01.local 
--workload-cluster-name silver-pg01-11706e --workload-cluster-namespace dsm-ns-lfyn4 
--auth-type basic

? Provide a name for the context:  silver-pg01

? Provide a name for the context:  silver-pg01
Provide Password: *********

[i] Logging in to Kubernetes cluster (silver-pg01-11706e) (dsm-ns-lfyn4)
[i] Successfully logged in to Kubernetes cluster 192.168.22.4

You have access to the following contexts:
   silver-pg01
   silver-pg01:silver-pg01-11706e

If the namespace context you wish to use is not in this list, you may need to
refresh the context again, or contact your cluster administrator.
To change context, use `vcf context use <context_name>`
[ok] successfully created context: silver-pg01
[ok] successfully created context: silver-pg01:silver-pg01-11706e


> vcf.exe context list
  NAME                                           CURRENT  TYPE
  cormac-sv                                      true     kubernetes
  cormac-sv:dsm-ns-lfyn4                         false    kubernetes
  cormac-sv:silver-tenant-ns-b8syh               false    kubernetes
  cormac-sv:svc-auto-attach-domain-c10           false    kubernetes
  cormac-sv:svc-consumption-operator-domain-c10  false    kubernetes
  cormac-sv:svc-tkg-domain-c10                   false    kubernetes
  cormac-sv:svc-velero-domain-c10                false    kubernetes
  cormac-sv:vks-ns-2vm7n                         false    kubernetes
  cormac-sv:vks-project-ns-74lj4                 false    kubernetes
  silver-pg01                                    false    kubernetes
  silver-pg01:silver-pg01-11706e                 false    kubernetes
[i] Use '--wide' to view additional columns.


> vcf context use silver-pg01:silver-pg01-11706e

[ok] Token is still active. Skipped the token refresh for context "silver-pg01:silver-pg01-11706e"
[i] Successfully activated context 'silver-pg01:silver-pg01-11706e' (Type: kubernetes)
[i] Fetching recommended plugins for active context 'silver-pg01:silver-pg01-11706e'...
[ok] No recommended plugins found.

> vcf.exe context list

  NAME                                           CURRENT  TYPE
  cormac-sv                                      false    kubernetes
  cormac-sv:dsm-ns-lfyn4                         false    kubernetes
  cormac-sv:silver-tenant-ns-b8syh               false    kubernetes
  cormac-sv:svc-auto-attach-domain-c10           false    kubernetes
  cormac-sv:svc-consumption-operator-domain-c10  false    kubernetes
  cormac-sv:svc-tkg-domain-c10                   false    kubernetes
  cormac-sv:svc-velero-domain-c10                false    kubernetes
  cormac-sv:vks-ns-2vm7n                         false    kubernetes
  cormac-sv:vks-project-ns-74lj4                 false    kubernetes
  silver-pg01                                    false    kubernetes
  silver-pg01:silver-pg01-11706e                 true     kubernetes
[i] Use '--wide' to view additional columns.


> kubectl get nodes

NAME                             STATUS   ROLES           AGE     VERSION
silver-pg01-11706e-r88pq-2dx24   Ready    control-plane   2d23h   v1.32.0+vmware.6-fips


> kubectl get pods -A

NAMESPACE                       NAME                                                     READY   STATUS      RESTARTS       AGE
cert-manager                    cert-manager-5668b6499f-t29zf                            1/1     Running     2 (31m ago)    2d23h
cert-manager                    cert-manager-cainjector-859df965db-5zf4p                 1/1     Running     2 (31m ago)    2d23h
cert-manager                    cert-manager-webhook-7d66fb4668-bg9s8                    1/1     Running     1 (31m ago)    2d23h
d14a47-silver-tenant-ns-b8syh   default-incremental-backup-29422079-v8rmz                0/1     Completed   0              2d11h
d14a47-silver-tenant-ns-b8syh   default-incremental-backup-29423519-fhsmj                0/1     Completed   0              35h
d14a47-silver-tenant-ns-b8syh   default-incremental-backup-29424959-vl5wm                0/1     Completed   0              31m
d14a47-silver-tenant-ns-b8syh   silver-pg01-0                                            4/4     Running     4 (31m ago)    2d23h
d14a47-silver-tenant-ns-b8syh   silver-pg01-monitor-0                                    4/4     Running     4 (31m ago)    2d23h
kube-system                     antrea-agent-zmk8v                                       2/2     Running     5 (29m ago)    2d23h
kube-system                     antrea-controller-f5d6d787f-m2bbn                        1/1     Running     4 (29m ago)    2d23h
kube-system                     coredns-57db7b44f5-csfjm                                 1/1     Running     1 (31m ago)    2d23h
kube-system                     coredns-69b565fcb5-9bswf                                 0/1     Pending     0              2d23h
kube-system                     docker-registry-silver-pg01-11706e-r88pq-2dx24           1/1     Running     1 (31m ago)    2d23h
kube-system                     etcd-silver-pg01-11706e-r88pq-2dx24                      1/1     Running     1 (31m ago)    2d23h
kube-system                     image-puller-4jltj                                       1/1     Running     5 (31m ago)    2d23h
kube-system                     kube-apiserver-silver-pg01-11706e-r88pq-2dx24            1/1     Running     1 (31m ago)    2d23h
kube-system                     kube-controller-manager-silver-pg01-11706e-r88pq-2dx24   1/1     Running     3 (31m ago)    2d23h
kube-system                     kube-proxy-cts9x                                         1/1     Running     1 (31m ago)    2d23h
kube-system                     kube-scheduler-silver-pg01-11706e-r88pq-2dx24            1/1     Running     3 (31m ago)    2d23h
kube-system                     metrics-server-6ccf55cf87-4jbzt                          1/1     Running     1 (31m ago)    2d23h
kube-system                     snapshot-controller-7ccbcfddfd-4czzc                     1/1     Running     1 (31m ago)    2d23h
pinniped-concierge              pinniped-concierge-77ccbc897d-75l2z                      1/1     Running     1 (31m ago)    2d23h
pinniped-concierge              pinniped-concierge-77ccbc897d-ww9fl                      1/1     Running     1 (31m ago)    2d23h
pinniped-concierge              pinniped-concierge-kube-cert-agent-7449f8dbbb-vjt5b      1/1     Running     1 (31m ago)    2d23h
secretgen-controller            secretgen-controller-5cbf99f6c-t9bdn                     1/1     Running     1 (31m ago)    2d23h
telegraf                        telegraf-6d994786d8-cn82f                                1/1     Running     1 (31m ago)    2d23h
tkg-system                      kapp-controller-7ff74d9865-4989q                         2/2     Running     5 (29m ago)    2d23h
vmware-sql-postgres             postgres-operator-56ff7f7679-dd8vl                       1/1     Running     1 (31m ago)    2d23h
vmware-system-antrea            antrea-pre-upgrade-job-dsknp                             0/1     Completed   0              2d23h
vmware-system-auth              guest-cluster-auth-svc-lgv4x                             1/1     Running     1 (31m ago)    2d23h
vmware-system-cloud-provider    guest-cluster-cloud-provider-67f87c6699-qnhqd            1/1     Running     6 (29m ago)    2d23h
vmware-system-csi               vsphere-csi-controller-854ffbff6-w9lbf                   7/7     Running     7 (31m ago)    2d23h
vmware-system-csi               vsphere-csi-node-jp6jn                                   3/3     Running     13 (29m ago)   2d23h

In this VKS context, in the namespace d14a47-silver-tenant-ns-b8syh, I can see the pod for the primary database (silver-pg01-0) as well as the pod for the monitor (silver-pg01-monitor-0) which I’ve highlighted in blue above. Note that the primary pod has 4 containers called pg-container, instance-logging, reconfigure-instance and postgres-sidecar. Using kubectl, I can describe the pod, get events and look at the container logs, e.g., if  you want to look at the pg-container log, run the following command (-c for container):

> kubectl logs silver-pg01-0 -c pg-container -n d14a47-silver-tenant-ns-b8syh | more
2025-12-12T10:33:02.722Z INFO postgresinstance Removing post start tasks executed
2025-12-12T10:33:02.731Z INFO postgresinstance Removed post start tasks executed
2025-12-12T10:33:02.731Z INFO postgresinstance Running pre-start tasks
2025-12-12T10:33:02.731Z INFO postgresinstance executing {"task": "ConfigureDirectoryPermission"}
2025-12-12T10:33:02.731Z INFO postgresinstance executing {"task": "ConfigurePassFileTask"}
2025-12-12T10:33:02.732Z INFO postgresinstance executing {"task": "WaitForMonitorTask"}
2025-12-12T10:33:02.825Z INFO postgresinstance failed to connect to `user=autoctl_node database=pg_auto_failover`: hostname resolving error: lookup silver-pg01-monitor-0.silver-pg01-agent.d14a47-silver-tenant-ns-b8syh.svc.cluster.local on 10.96.0.10:53: no such host
2025-12-12T10:33:07.844Z INFO postgresinstance failed to connect to `user=autoctl_node database=pg_auto_failover`: 192.168.0.16:5432 (silver-pg01-monitor-0.silver-pg01-agent.d14a47-silver-tenant-ns-b8syh.svc.cluster.local): dial error: dial tcp 192.168.0.16:5432: connect: connection refused
2025-12-12T10:33:12.898Z INFO postgresinstance Connected to monitor
2025-12-12T10:33:12.898Z INFO postgresinstance executing {"task": "CleanUpDatabaseProcessTask"}
2025-12-12T10:33:12.898Z INFO postgresinstance Start cleanup process database...
pg_ctl: could not send stop signal (PID: 309): No such process
2025-12-12T10:33:12.901Z INFO postgresinstance executing {"task": "RemovePostmasterPidTask"}
2025-12-12T10:33:12.902Z INFO postgresinstance executing {"task": "ClearCustomTempDirTask"}
2025-12-12T10:33:12.903Z INFO postgresinstance executing {"task": "WriteCustomConfigFileTask"}
2025-12-12T10:33:12.923Z INFO postgresinstance Applying custom config {"config": {"Mode":"verify-ca","CAFilePath":"/etc/postgres_ssl/ca.crt","CertFilePath":"/etc/postgres_ssl/tls.crt","KeyFilePath":"/etc/postgres_ssl/tls.key","CustomConfigFilePath":"/pgsql/custom/postgresql-custom-override.conf","IsArchiveModeEnabled":true,"UnixSocketDirectories":["/pgsql/custom/tmp","/tmp"],"ArchiveCommand":"pgbackrest --stanza=d14a47-silver-tenant-ns-b8syh-silver-pg01-293d8da7-4489-4a93-a5c4-7a949b2960d4 archive-push %p","SharedPreloadLibraries":["pg_stat_statements","pgaudit","pg_cron"],"PostgresVersion":"17","PostgresLogDirectory":"/pgsql/logs/postgres","UserProvidedCustomPostgresConfigPath":"/etc/customconfig/postgresql.conf","BackupBasedContinuousRestoreMode":false,"SharedBuffers":"2654 MB","WorkMem":"26 MB","WalKeepSize":"96 MB","WalKeepSegments":6,"MaintenanceWorkMem":"530 MB","EffectiveCacheSize":"5309 MB","MaxSlotWalKeepSize":"1998 MB"}}
2025-12-12T10:33:12.924Z INFO postgresinstance executing {"task": "ConfigureMonitorConnectionStringTask"}
2025-12-12T10:33:12.939Z INFO postgresinstance executing {"task": "ConfigureSSLTask"}
2025-12-12T10:33:12.947Z INFO postgresinstance ssl.ca_file is already set to /etc/postgres_ssl/ca.crt
2025-12-12T10:33:12.954Z INFO postgresinstance ssl.cert_file is already set to /etc/postgres_ssl/tls.crt
2025-12-12T10:33:12.962Z INFO postgresinstance ssl.key_file is already set to /etc/postgres_ssl/tls.key
2025-12-12T10:33:12.962Z INFO postgresinstance executing {"task": "InitializeDatabaseTask"}
2025-12-12T10:33:12.962Z INFO postgresinstance Start initializing database...

So as you can clearly see, some low-level troubleshooting can be done using the new VCF CLI when a DSM database is provisioned via VCF Automation, using a Supervisor namespace infrastructure policy and is there fore using the vSphere Kubernetes Service to host the database.

Note: If you are running VCF v9.0.1 with DSM integrated, and you run the above commands and notice that there are no running containers (0/4) in the database pod, then you may have encountered an issue with the VKS Management service consuming most of the available resources in the control plane nodes of the VKS cluster. The VKS Management component automatically adds a bunch of agents to the control plane nodes of a VKS cluster, resulting in not enough resources to run the database. If you think this might be it, this is the KB which describes the workaround of adding the database namespace to the VKSM configMap – https://knowledge.broadcom.com/external/article?articleNumber=412306

SSH access to the VKS node

Now the final part of this section is to show you how to ssh onto the VKS node that hosts the database. Caution: With great power comes great responsibility. I would urge you to use extreme care if logging onto the VKS node, as you may end up doing something that impacts the database. However, there may be valid reasons where you need to do this, perhaps checking networking connectivity, etc. So, as per the official documentation, here is how to get ssh access to a VKS node using a jumpbox PodVM that has been deployed onto the same Namespace as the VKS cluster.

The first step is to switch contexts once more, and go back to the Supervisor context, cormac-sv. From here, you will need to list the Kubernetes “secrets” in the namespace where the database infra has been provisioned, in this case the dsm-ns-lfyn4. This ssh secret contains a private key. With this information,  you will be able to ssh onto the VKS node as a system user (vmware-system-user). The secret we are interested in is highlighted in blue below.

> vcf.exe context use cormac-sv
[ok] Token is still active. Skipped the token refresh for context "sv"
[i] Successfully activated context 'sv' (Type: kubernetes)
[i] Fetching recommended plugins for active context 'sv'...
[ok] All recommended plugins are already installed and up-to-date.

kubectl.exe get secrets -n dsm-ns-lfyn4
NAME                                                        TYPE                                  DATA   AGE
cluster-autoscaler-secret                                   kubernetes.io/service-account-token   3      3d
silver-pg01-11706e-antrea-data-values                       Opaque                                1      3d
silver-pg01-11706e-auth-svc-cert                            kubernetes.io/tls                     3      3d
silver-pg01-11706e-ca                                       cluster.x-k8s.io/secret               2      3d
silver-pg01-11706e-control-plane-machine-agent-conf         Opaque                                1      3d
silver-pg01-11706e-encryption                               Opaque                                1      3d
silver-pg01-11706e-encryption-config                        Opaque                                1      3d
silver-pg01-11706e-etcd                                     cluster.x-k8s.io/secret               2      3d
silver-pg01-11706e-extensions-ca                            kubernetes.io/tls                     3      3d
silver-pg01-11706e-gateway-api-package                      clusterbootstrap-secret               0      3d
silver-pg01-11706e-guest-cluster-auth-service-data-values   Opaque                                1      3d
silver-pg01-11706e-kapp-controller-data-values              Opaque                                2      3d
silver-pg01-11706e-kubeconfig                               cluster.x-k8s.io/secret               1      3d
silver-pg01-11706e-ma-token                                 Opaque                                2      3d
silver-pg01-11706e-metrics-server-package                   clusterbootstrap-secret               0      3d
silver-pg01-11706e-pinniped-package                         clusterbootstrap-secret               1      3d
silver-pg01-11706e-proxy                                    cluster.x-k8s.io/secret               2      3d
silver-pg01-11706e-r88pq-2dx24                              cluster.x-k8s.io/secret               2      3d
silver-pg01-11706e-sa                                       cluster.x-k8s.io/secret               2      3d
silver-pg01-11706e-secretgen-controller-package             clusterbootstrap-secret               1      3d
silver-pg01-11706e-ssh                                      kubernetes.io/ssh-auth                1      3d
silver-pg01-11706e-ssh-password                             Opaque                                1      3d
silver-pg01-11706e-ssh-password-hashed                      Opaque                                1      3d
silver-pg01-11706e-user-trusted-ca-secret                   Opaque                                1      3d
silver-pg01-11706e-v69f5-ccm-secret                         kubernetes.io/service-account-token   3      3d
silver-pg01-11706e-v69f5-pvbackupdriver-secret              kubernetes.io/service-account-token   3      3d
silver-pg01-11706e-v69f5-pvcsi-secret                       kubernetes.io/service-account-token   3      3d
silver-pg01-11706e-vsphere-cpi-data-values                  Opaque                                1      3d
silver-pg01-11706e-vsphere-pv-csi-data-values               Opaque                                1      3d

The next step is to create a YAML manifest which describes the PodVM that we wish to deploy. Remember that this PodVM is deployed in the same namespace (dsm-ns-lfyn4) as the VKS cluster running the Postgres database pods. Here is an example of such as PodVM. It uses a Photon OS image and passes a command that copies the private key from the volume created from the secret which we referenced earlier. The secretName is highlighted in blue below once more. The volume holding the private key is mounted onto /root/ssh. We use yum to install openssh. The private key is then copied to a file called /root/.ssh/id_rsa. This now allows an ssh session as the ‘vmware-system-user’ onto the VKS node from the jumpbox PodVM. We pass the ssh command as an argument when we exec to the Pod (we will see how to do this shortly).

apiVersion: v1
kind: Pod
metadata:
  name: jumpbox
  namespace: dsm-ns-lfyn4
spec:
  containers:
  - image: "photon:5.0"
    name: jumpbox
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ]
    volumeMounts:
      - mountPath: "/root/ssh"
        name: ssh-key
        readOnly: true
    resources:
      requests:
        memory: 2Gi
  volumes:
    - name: ssh-key
      secret:
        secretName: silver-pg01-11706e-ssh
  imagePullSecrets:
    - name: regcred

Next, apply the manifest to create the PodVM, and ensure it is running.

> kubectl.exe apply -f jumpbox-dsm.yml
pod/jumpbox created
 
> kubectl get pods -n dsm-ns-lfyn4 
NAMESPACE             NAME                READY   STATUS      RESTARTS        AGE
dsm-ns-lfyn4          jumpbox             1/1     Running     0               3m37s

Now, determine the IP address of the node that we wish to ssh onto. The following command, which queries virtual machines in the namespace, will provide this information.


> kubectl.exe get vm -o wide -n dsm-ns-lfyn4
NAME                             POWER-STATE   CLASS               IMAGE                   PRIMARY-IP4     AGE
silver-pg01-11706e-r88pq-2dx24   PoweredOn     best-effort-large   vmi-24554b66363a299c5   192.173.237.3   2d3h

The ssh session can now be initiated. Run a ‘kubectl exec’ command as show below. As soon as you ‘exec’ onto the jumpbox PodVM, the ssh command is run to the VKS node’s IP address is run. The PodVM will run the command and args as defined in the YAML manifest above to configure the private key for the ‘vmware-system-user’. This then allows the ssh command passed to the ‘kubectl exec’ onto the VKS node to succeed.

> kubectl exec -it jumpbox -n dsm-ns-lfyn4 -- /usr/bin/ssh vmware-system-user@192.173.237.3
The authenticity of host '192.173.237.3 (192.173.237.3)' can't be established.
ED25519 key fingerprint is SHA256:HEu9cZ9Uq4EFwQvR1KiWbJWlR0Jd3SXNK7lu8C6WBnY.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.173.237.3' (ED25519) to the list of known hosts.
cat: /var/run/motdgen/motd: Permission denied

vmware-system-user@silver-pg01-11706e-r88pq-2dx24 [ ~ ]$ ip a | more
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000
    link/ether 04:50:56:00:78:00 brd ff:ff:ff:ff:ff:ff
    altname eno1
    altname enp11s0
    altname ens192
    inet 192.173.237.3/27 brd 192.173.237.31 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::650:56ff:fe00:7800/64 scope link
       valid_lft forever preferred_lft forever
.
.
.

And that completes the step to gain ssh access onto a VKS node that is backing a DSM provisioned database. Refer to the official documentation linked earlier for other methods (such as password) to gain access. If you need to lock down ssh access the the database, you can of course create an NSX Firewall Rule and block this port. There is a simple example of how to do this available here.

Summary

That concludes the post. Hopefully you have seen some of the ways in which it is possible to troubleshoot DSM database deployments on vSphere Kubernetes Services clusters. As mentioned, this is the default Kubernetes used when the infrastructure policy for the database is a Supervisor Namespace, typically provisioned via VCF Automation in VCF 9.x.

Exit mobile version