First, I found this error in the sps.log file which is located in /storage/log/vmware/vmware-sps on the vCenter appliance:
2018-08-02T13:50:47.967Z [pool-22-thread-10] DEBUG opId=StorageProvidersViewMediator-remove-329899-ngc:70036368 com.vmware.vim.sms.task.TaskImpl - Created a task - ManagedObjectReference: type = SmsTask, value = SmsTask-22, serverGuid = 126f5b2d-ae91-4173-bdb6-1fbefcc7e6e2 2018-08-02T13:50:47.967Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.StorageManagerImpl - unregisterProvider() called for provider - 453d453a-ac19-4bf5-b1f7-7252ad84950a 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.task.TaskImpl - Updated progress to 10% for task ManagedObjectReference: type = SmsTask, value = SmsTask-22, serverGuid = 126f5b2d-ae91-4173-bdb6-1fbefcc7e6e2 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.ProviderFactory - [removeProvider] Removing provider - 453d453a-ac19-4bf5-b1f7-7252ad84950a 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.ProviderCache - [removeEntry] Remove the cache entry for the provider - 453d453a-ac19-4bf5-b1f7-7252ad84950a 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.ProviderCache - [removeFdMappingForProvider] No FaultDomainMapping objects found for given fualtDomainId list [] 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverHandler - [removeProvider] Removing provider: 453d453a-ac19-4bf5-b1f7-7252ad84950a from priority cache 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - [FAILOVER] Provider failover started: [provider: 453d453a-ac19-4bf5-b1f7-7252ad84950a; array: com.purestorage:d4d78052-ea56-4a7e-9df9-67d1a1634faf] 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - [getBackupProvider] Looking for the new backup provider... 2018-08-02T13:50:47.968Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - Check backup provider b1801691-98a2-4484-b156-f91e55ca54c6 ... 2018-08-02T13:50:47.969Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - NeedsExplicitActivation is not set. Skipping activating provider... 2018-08-02T13:50:47.969Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - [getBackupProvider] Found new backup provider: b1801691-98a2-4484-b156-f91e55ca54c6 2018-08-02T13:50:47.969Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.provider.failover.ProviderFailoverEngine - [updateActiveProvider] Flush event queue for the provider. 2018-08-02T13:50:47.969Z [pool-14-thread-2] ERROR opId= com.vmware.vim.sms.provider.ProviderCache - Error while removing an entry for the provider in the providerCache - java.lang.NullPointerException at com.vmware.vim.sms.provider.vasa.VasaProviderImpl.flushEventQueue(VasaProviderImpl.java:1562) at com.vmware.vim.sms.provider.failover.ProviderFailoverEngine.removeOldActiveProvider(ProviderFailoverEngine.java:296) at com.vmware.vim.sms.provider.failover.ProviderFailoverEngine.failover(ProviderFailoverEngine.java:126) at com.vmware.vim.sms.provider.failover.ProviderFailoverEngine.failover(ProviderFailoverEngine.java:73) at com.vmware.vim.sms.provider.failover.ProviderFailoverHandler.removeProvider(ProviderFailoverHandler.java:119) at com.vmware.vim.sms.provider.ProviderCache.removeEntry(ProviderCache.java:359) at com.vmware.vim.sms.provider.ProviderFactory.removeVasaProvider(ProviderFactory.java:422) at com.vmware.vim.sms.provider.ProviderFactory.removeProvider(ProviderFactory.java:336) at com.vmware.vim.sms.StorageManagerImpl.unregisterProviderInt(StorageManagerImpl.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.vmware.vim.sms.task.JobHandler.run(JobHandler.java:70) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-08-02T13:50:47.969Z [pool-14-thread-2] DEBUG opId= com.vmware.vim.sms.task.TaskManagerImpl - [markTaskCompleted] Marked SmsTask-22 as complete
Fortunately, I found a procedure that, as far as I am aware, is not available publicly. These are the steps I followed to remove my original vCenter registration and resolve the situation where I appear to have a ‘stale’ VP present. I’m assuming this can happen for a number of reasons, which is why I decided to post the steps here. My best guess in this case is that my vCenter was known, but no longer trusted (or something to that effect).
Step 1.
SSH to the vCenter Appliance and login as root.
Step 2.
Modify /usr/lib/vmware-vpx/sps/conf/sms-config.xml file to change retryAttempt parameter from 99999 to 4:
<!-- Retry configuration for QS connection --> <qsConnection> <!-- Retry attempt for QS connection. --> <retryAttempt>4</retryAttempt> <--Here <!-- Retry interval (in seconds) for QS connection. --> <retryInterval>30</retryInterval> </qsConnection>
Step 3.
On the vCenter server, block access to the IP address of VP that you cannot remove via static routes. When the SPS services comes up next time, vCenter will mark the VP as Offline as it can no longer reach it.
# route add -host <StorageProviderIP> reject
Step 4.
On the vCenter server, stop the profile driven storage service (SPS)
# service-control --stop vmware-sps
Step 5.
On the vCenter server, restart the profile driven storage service (this also initiates the VASA providers)
# service-control --start vmware-sps
Step 6.
Wait for the SPS service to fully start. This can be checked in the vSphere UI, under Administration > Deployment > System Configuration > Services.
Step 7.
Try again to deleted the problematic VASA storage provider entry via the vSphere client UI. Since it is now offline, it should be removable. This now worked for me.
Step 8.
On the vCenter server, unblock the route to the VASA storage provider that we previously blocked.
# route del -host <StorageProviderIP> reject
Step 9.
Modify /usr/lib/vmware-vpx/sps/conf/sms-config.xml file to put back the retryAttempt parameter to 99999 from 4:
<!-- Retry configuration for QS connection --> <qsConnection> <!-- Retry attempt for QS connection. --> <retryAttempt>99999</retryAttempt> <--Here <!-- Retry interval (in seconds) for QS connection. --> <retryInterval>30</retryInterval> </qsConnection>
Step 10.
On the vCenter server, stop and start the profile driven storage service (SPS) once more.
# service-control --stop vmware-sps # service-control --start vmware-sps
Step 11.
Wait for the service to fully restart. Now add back the VASA Storage Provider via the UI. This should now work as well (it did for me). Note that this issue should be addressed in some upcoming updates to vSphere, and that these steps should no longer be necessary.