Note: At the time of writing, DSM version 2.0 is not yet generally available. Therefore some of the screenshots and command line outputs captured in this post may differ slightly in the final product. However, for the purposes of our getting started series of posts, this should not matter too much.
Overview
The sequence of steps involved in restoring the Data Services Manager appliance can be summarised in the following steps:
- Where possible, retrieve the /data/pgbackrest.conf from the original DSM provider. This file contains details about the provider’s backup configuration, including S3 object store, bucket, S3 credentials and file path to the backup. If this can not be retrieved, it can be manually created on the new DSM provider.
- Power off original DSM appliance.
- Remove the DSM plugin from the vCenter server via the vSphere Client > Administration > Solutions > Client Plugins > Data Services Manager Plugin > Remove.
- Deploy a new DSM appliance (using the same version as the original appliance) via the vSphere Client > Administration > Solutions > Client Plugins > Add. Ensure the configuration details of the new OVA match the original DSM OVA (e.g. networking).
- Login to the new DSM appliance and setup the pgbackrest.conf so that the restore tool knows where to retrieve the provider backup from.
- Initiate the restore of the DSM appliance/provider using the restore-provider tool and the pgbackrest.conf configuration.
Unregister the DSM Plugin using docker
In step 3 above, it mentions that the DSM plugin for the original DSM provider must be removed from vCenter server. You can do this simply via the UI, but I also wanted to show how to do it using a docker command from the provider command line. To run this docker command, you must login to the original DSM appliance. The command requires a number of configuration items, including the vCenter server SHA-256 thumbprint (-vct). The vCenter thumbprint can be found by clicking on the icon immediately before the URL in the web browser connected to the vSphere Client. Click on the Certificate View to bring up the Certificate details. From here you can find the SHA-256 fingerprint/thumbprint for the vCenter server. I have also included the -insecure flag in this command since my vCenter server is using self-signed certs and not a custom certificate. Note that the version used in extension-registration:<version> will be different in your environment since this example is using pre-GA code. You can get the version from the docker images command.
# docker images | grep extension-reg extension-registration 2.0.0-23127626 4c067666dd38 8 days ago 179MB # docker run --name extension_registration \ --rm extension-registration:2.0.0-23127626 \ -action unregisterPlugin \ -insecure \ -url https://<vCenter IP or FQDN>/sdk \ -vct 17:8e:93:56:a0:f6:8c:53:22:3d:ea:bf:aa:42:f8:f1:34:32:04:7c:89:02:76:2f:80:2d:61:ba:e4:77:8f:70 \ -username 'administrator@vsphere.local' \ -password '********' \ -key com.vmware.dsm.plugin
Configure pgbackrest.conf
Now that the plugin has been removed successfully from vCenter server, we can proceed with deploying a new DSM appliance. We can then initiate the restore of the original DSM provider content to this new DSM provider. The first step is to configure the pgbackrest.conf file found in /data on the new DSM appliance. There will be a pgbackrest.conf template already in the /data directory when you login to the appliance. This will need to be configured to point to the S3 compliant Object Store that you previously configured for Provider backups, as well as the bucket name and access credentials. Here is an example from my environment:
[global] repo1-path=/provider-backups-5ce48c2a-82b3-49c0-a96d-faa0bf17d612 repo1-type=s3 repo1-s3-endpoint=https://192.168.0.1:9000 repo1-s3-bucket=provider-backup repo1-s3-uri-style=path repo1-s3-verify-tls=n repo1-s3-key=admin repo1-s3-key-secret=password repo1-s3-region=us-east-1 repo1-retention-full=7 process-max=2 log-level-console=info log-level-file=error start-fast=y delta=y [main] pg1-path=/data/vpgsql
As mentioned, if you had access to the original DSM appliance beforehand, then you could retrieve the /data/pgbackrest.conf from there and copy it to this new DSM appliance. When the provider backup is configured on a DSM appliance, it is the pgbackrest.conf that holds all of the details relating to the provider backups.
If you are building this file from scratch, you will have to figure out what the repo1-path is. To find the correct repo1-path, navigate to the object storage interface, locate the repo1-s3-bucket used for provider backups, and then check the name of the folder underneath. If there are multiple folders, continue navigating to the backup/main sub-folders of each, and look for the date and time on the backup.info. This will tell you if this is the folder containing the latest backup and that folder is the one which should be set in the repo1-path field. Here is an example to show what I mean, taken from my MinIO Object Storage system. This is the correct folder (highlighted in the blue box) as the most recent backups were just taken at midnight, as shown in red.
Run restore-provider
Using the configured pgbackrest.conf, the restore-provider tool can now be run to restore the original DSM provider contents to this newly provisioned Provider. There is a lot of output, which I have truncated below. The restore process usually takes approx 8 to 10 minutes to complete in my experience.
# restore-provider -c /data/pgbackrest.conf . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v2.7.13) 2024-01-08 12:04:15.799 INFO [main ] p.r.ProviderRestoreApplication - Starting ProviderRestoreApplication v2.0.0 using Java 11.0.20.1 on dsm-provider.rainpole.com with PID 8208 (/opt/vmware/tdm-provider/restore-service/bin/tdm-sp-provider-restore.jar started by root in /opt/vmware/tdm-provider/restore-service) 2024-01-08 12:04:15.804 DEBUG [main ] p.r.ProviderRestoreApplication - Running with Spring Boot v2.7.13, Spring v5.3.28 2024-01-08 12:04:15.804 INFO [main ] p.r.ProviderRestoreApplication - The following 1 profile is active: "restore" 2024-01-08 12:04:17.452 INFO [main ] epositoryConfigurationDelegate - Bootstrapping Spring Data JPA repositories in DEFAULT mode. 2024-01-08 12:04:18.563 INFO [main ] epositoryConfigurationDelegate - Finished Spring Data repository scanning in 1096 ms. Found 98 JPA repository interfaces. 2024-01-08 12:04:20.349 INFO [main ] o.s.b.w.e.t.TomcatWebServer - Tomcat initialized with port(s): 9999 (http) 2024-01-08 12:04:20.368 INFO [main ] o.a.c.core.StandardService - Starting service [Tomcat] 2024-01-08 12:04:20.368 INFO [main ] o.a.c.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.76] 2024-01-08 12:04:20.483 INFO [main ] o.a.c.c.C.[.[localhost].[/] - Initializing Spring embedded WebApplicationContext 2024-01-08 12:04:20.484 INFO [main ] letWebServerApplicationContext - Root WebApplicationContext: initialization completed in 4579 ms 2024-01-08 12:04:20.702 INFO [main ] c.v.t.s.c.DataSourceConfig - jdbc:postgresql:vmware 2024-01-08 12:04:20.706 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __INIT__setDatasourceConfig - key=spring.datasource.hikari.maximum-pool-size 2024-01-08 12:04:20.706 INFO [main ] c.v.t.s.c.DataSourceConfig - set SpringConfig: key=spring.datasource.hikari.maximum-pool-size, value=5 2024-01-08 12:04:20.706 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __DONE__setDatasourceConfig - key=spring.datasource.hikari.maximum-pool-size 2024-01-08 12:04:20.707 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __INIT__setDatasourceConfig - key=spring.datasource.hikari.connectionTimeout 2024-01-08 12:04:20.707 INFO [main ] c.v.t.s.c.DataSourceConfig - set SpringConfig: key=spring.datasource.hikari.connectionTimeout, value=20000 2024-01-08 12:04:20.707 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __DONE__setDatasourceConfig - key=spring.datasource.hikari.connectionTimeout 2024-01-08 12:04:20.707 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __INIT__setDatasourceConfig - key=spring.datasource.hikari.minimum-idle 2024-01-08 12:04:20.708 INFO [main ] c.v.t.s.c.DataSourceConfig - set SpringConfig: key=spring.datasource.hikari.minimum-idle, value=5 2024-01-08 12:04:20.708 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __DONE__setDatasourceConfig - key=spring.datasource.hikari.minimum-idle 2024-01-08 12:04:20.708 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __INIT__setDatasourceConfig - key=spring.datasource.hikari.idle-timeout 2024-01-08 12:04:20.708 INFO [main ] c.v.t.s.c.DataSourceConfig - set SpringConfig: key=spring.datasource.hikari.idle-timeout, value=60000 2024-01-08 12:04:20.708 DEBUG [main ] c.v.t.s.c.DataSourceConfig - __DONE__setDatasourceConfig - key=spring.datasource.hikari.idle-timeout 2024-01-08 12:04:20.719 WARN [main ] com.zaxxer.hikari.HikariConfig - StandaloneHikariPool - idleTimeout has been set but has no effect because the pool is operating as a fixed size pool. 2024-01-08 12:04:20.721 INFO [main ] c.z.hikari.HikariDataSource - StandaloneHikariPool - Starting... 2024-01-08 12:04:20.861 INFO [main ] c.z.hikari.HikariDataSource - StandaloneHikariPool - Start completed. 2024-01-08 12:04:21.364 INFO [main ] o.h.j.internal.util.LogHelper - HHH000204: Processing PersistenceUnitInfo [name: default] 2024-01-08 12:04:21.488 INFO [main ] org.hibernate.Version - HHH000412: Hibernate ORM core version 5.6.15.Final 2024-01-08 12:04:21.776 INFO [main ] o.h.annotations.common.Version - HCANN000001: Hibernate Commons Annotations {5.1.2.Final} 2024-01-08 12:04:22.052 INFO [main ] org.hibernate.dialect.Dialect - HHH000400: Using dialect: org.hibernate.dialect.PostgreSQLDialect 2024-01-08 12:04:22.540 WARN [main ] Hibernate Types - You should use Hypersistence Optimizer to speed up your Hibernate application! 2024-01-08 12:04:22.541 WARN [main ] Hibernate Types - For more details, go to https://vladmihalcea.com/hypersistence-optimizer/ 2024-01-08 12:04:22.541 INFO [main ] Hibernate Types - _ _ _ _ | | | | (_) | | | |__| |_ _ _ __ ___ _ __ ___ _ ___| |_ ___ _ __ ___ ___ | __ | | | | '_ \ / _ \ '__/ __| / __| __/ _ \ '_ \ / __/ _ \ | | | | |_| | |_) | __/ | \__ \ \__ \ || __/ | | | (_| __/ |_| |_|\__, | .__/ \___|_| |___/_|___/\__\___|_| |_|\___\___| __/ | | |___/|_| ____ _ _ _ / __ \ | | (_) (_) | | | |_ __ | |_ _ _ __ ___ _ _______ _ __ | | | | '_ \| __| | '_ ` _ \| |_ / _ \ '__| | |__| | |_) | |_| | | | | | | |/ / __/ | \____/| .__/ \__|_|_| |_| |_|_/___\___|_| | | |_| 2024-01-08 12:04:22.542 INFO [main ] Hibernate Types - Check out the README page for more info about the Hypersistence Optimizer banner https://github.com/vladmihalcea/hibernate-types#how-to-remove-the-hypersistence-optimizer-banner-from-the-log 2024-01-08 12:04:24.421 INFO [main ] e.t.j.p.i.JtaPlatformInitiator - HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform] 2024-01-08 12:04:24.435 INFO [main ] tainerEntityManagerFactoryBean - Initialized JPA EntityManagerFactory for persistence unit 'default' 2024-01-08 12:04:25.780 INFO [main ] rgetType$TargetFetcherInjector - Injecting targetFetcher : SERVICE_INSTANCE 2024-01-08 12:04:25.780 INFO [main ] rgetType$TargetFetcherInjector - Injecting targetFetcher : TASK 2024-01-08 12:04:25.782 INFO [main ] rgetType$TargetFetcherInjector - Injecting targetFetcher : S3_STORAGE 2024-01-08 12:04:25.782 INFO [main ] rgetType$TargetFetcherInjector - Injecting targetFetcher : PROVIDER_ENVIRONMENT 2024-01-08 12:04:25.782 INFO [main ] rgetType$TargetFetcherInjector - Injecting targetFetcher : SERVICE_INSTANCE_GROUP 2024-01-08 12:04:25.827 INFO [main ] c.v.t.c.w.c.SSLConfig - Setting java truststore to: /opt/vmware/tdm-provider/cert/truststore.jks . . . 2024-01-08 12:11:34.095 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes 2024-01-08 12:11:35.596 P00 INFO: backup start archive = 0000000200000000000000A8, lsn = 0/A8000028 2024-01-08 12:11:35.596 P00 INFO: check archive for prior segment 0000000200000000000000A7 WARN: a timeline switch has occurred since the 20240104-123916F_20240108-000001I backup, enabling delta checksum HINT: this is normal after restoring from backup or promoting a standby. 2024-01-08 12:11:44.690 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive 2024-01-08 12:11:45.391 P00 INFO: backup stop archive = 0000000200000000000000A8, lsn = 0/A802C8A0 2024-01-08 12:11:45.428 P00 INFO: check archive for segment(s) 0000000200000000000000A8:0000000200000000000000A8 2024-01-08 12:11:45.491 P00 INFO: new backup label = 20240104-123916F_20240108-121133I 2024-01-08 12:11:45.916 P00 INFO: incr backup size = 168.6MB, file total = 2163 2024-01-08 12:11:45.916 P00 INFO: backup command end: completed successfully (12983ms)] 2024-01-08 12:11:47.841 INFO [pool-4-thread-1] c.v.t.c.tools.CommandRunner - Result ExitCode: 0 2024-01-08 12:11:47.842 DEBUG [pool-4-thread-1] b.s.StateTriggerProviderBackup - ___DONE___stateTriggerProviderBackup___PROVIDER_UPDATE 2024-01-08 12:11:47.842 INFO [pool-4-thread-1] c.v.t.c.t.w.SimpleWorkflowFSM - DONE FSM execution << 2024-01-08 12:11:47.843 DEBUG [pool-4-thread-1] c.v.t.c.t.w.BaseSimpleWorkLoad - __DONE____________workflow {} class com.vmware.tdm.sp.provider.common.backup.workflow.ProviderBackupWorkLoad 2024-01-08 12:12:01.690 DEBUG [main ] ProviderBackupTaskAsyncMonitor - ___RETRY Fetch Provider backup task status. TaskId - [1] Status - [SUCCESS] 2024-01-08 12:12:01.693 INFO [main ] v.t.s.p.c.a.AuditActionHandler - __INIT__onSuccess. Source - PROVIDER_BACKUP, OperationType - {} 2024-01-08 12:12:01.694 INFO [main ] v.t.s.p.c.a.AuditActionHandler - __DONE__onSuccess PROVIDER_BACKUP 2024-01-08 12:12:01.699 INFO [audit-exec-2 ] c.v.t.s.p.c.a.AuditProcessor - AuditEntrySaved: AuditEntity(id=22e54a04-9814-45ad-97fc-649e7ba950ee, source=System, component=PROVIDER, operationType=PROVIDER_BACKUP, subject=null, details=Provider Backup Success, eventTime=2024-01-08 12:12:01.693, result=OK) 2024-01-08 12:12:01.700 DEBUG [main ] ProviderBackupTaskAsyncMonitor - ___DONE__monitorBackupTask___TaskId - [1] 2024-01-08 12:12:01.700 INFO [main ] .s.p.c.b.ProviderBackupService - ___DONE__createProviderBackup___ 2024-01-08 12:12:01.700 INFO [main ] c.v.t.s.p.r.c.RestoreDbCommand - __DONE__Saving provider backup settings to Recovered Provider 2024-01-08 12:12:01.710 INFO [main ] o.a.c.core.StandardService - Stopping service [Tomcat] 2024-01-08 12:12:01.719 WARN [main ] o.a.c.l.WebappClassLoaderBase - The web application [ROOT] appears to have started a thread named [StandaloneHikariPool housekeeper] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.base@11.0.20.1/jdk.internal.misc.Unsafe.park(Native Method) java.base@11.0.20.1/java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source) java.base@11.0.20.1/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.base@11.0.20.1/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.base@11.0.20.1/java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) java.base@11.0.20.1/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base@11.0.20.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base@11.0.20.1/java.lang.Thread.run(Unknown Source) 2024-01-08 12:12:01.792 INFO [main ] tainerEntityManagerFactoryBean - Closing JPA EntityManagerFactory for persistence unit 'default'
And there you have it. In just a couple of steps, the DSM appliance/provider is restored back to where it was previously, with all of the infrastructure policy information, all of the user/permission information and all of the data services and databases information available once again. It will also restore all configuration settings that were previously put in place, such as Log Forwarding, SMTP configuration, webhook configuration and LDAP settings. The dashboard view should be exactly the same on the new provider as it appeared on the original provider.
Register the DSM Plugin using docker
For completeness, I also wanted to show how to register the DSM plugin with vCenter using the docker CLI, even though this step is not necessary in the restore process. However, you might need to do this if the IP address of the DSM appliance changes, or there is a new certificate installed on the DSM appliance.
The DSM thumbprint (in the -serverThumbprint field) must be SHA-1 in DSM 2.0, and the fields must be ‘:’ separated. This is a nuance of the vCenter API. The easiest way that I have found to retrieve the thumbprint in this ‘:’ separated format is via the Firefox browser. Similar to how the vCenter thumbprint was retrieved, click on the icon immediately before the DSM URL in the browser. Click on Connection > More information > View Certificate and from there you can retrieve the SHA-1 thumbprint.
Also note that the docker command to register the DSM plugin also includes the -insecure flag here since my vCenter server is using self-signed certs and not a custom certificate. And as already highlighted, the version used in extension-registration:<version> and -version field may vary depending on the DSM version that is being used. The DSM version is displayed in the “Version & Upgrade” section of the DSM UI.
# docker run \
--name extension_registration \
--rm extension-registration:2.0.0-23127626 \
-action registerPlugin \
-remote \
-insecure \
-url https://vcsa-06.rainpole.com/sdk \
-vct 17:8e:93:56:a0:f6:8c:53:22:3d:ea:bf:aa:42:f8:f1:34:32:04:7c:89:02:76:2f:80:2d:61:ba:e4:77:8f:70 \
-username 'administrator@vsphere.local' \
-password '*******' \
-key com.vmware.dsm.plugin \
-version 2.0.0.3730 \
-pluginUrl https://<IP of Provider Appliance>/provider/plugin/plugin_signed.zip \
-serverThumbprint DB:02:BD:A8:19:10:4B:2B:86:39:69:D7:C6:B1:26:FF:A9:E5:B0:29 \
-c 'VMware, Inc.' \
-n 'Data Services Manager Plugin' \
-s 'DSM solution appliance with remote plugin'
That complete the post on restoring the Data Services Manager. Thanks for reading this far. Check out my other posts on DSM 2.0.