Task “Delete a virtual storage object” reports “A specified parameter was not correct”

Cormac

3 years ago

I’ve recently been looking at the vSphere Velero Plugin, and how the latest version of the plugin enables administrators to backup and restore vSphere with Tanzu Supervisor cluster objects as well as Tanzu Kubernetes “guest” cluster objects. This plugin utilizes vSphere snapshot technology, so that a Kubernetes Persistent Volume (PV) backed by a First Class Disk (FCD) in vSphere can be snapshot, and the snapshot is then moved by a Data Manager appliance to an S3 object store bucket. Once the data movement operation has completed, the snapshot is removed from the PV/FCD.

During the testing of this new functionality, I did encounter an issue which I wanted to share with you here, as well as a solution/workaround. Whilst getting everything up and running, I attempted some test backups when the data mover was not yet functioning correctly. This led to some stranded snapshots on the PV/FCDs. At the time of writing, it is not possible to delete a PV/FCD with a snapshot. However, later when I did get everything working as expected, I deleted the objects that I successfully backed up to do a test restore. Unfortunately, I never cleaned up the stranded snapshots on the PV/FCDs, so that when I deleted the PV/FCD, I encountered the issue described in the title of this post: the task “Delete a virtual storage object” reports “A specified parameter was not correct”. This is entirely down to the fact that the PV/FCD had a snapshot when the delete of the PV was issued. Note that the Kubernetes Persistent Volume / CNS Volume is removed, but the FCD is left behind. This operation is retried every 5 minutes.

There are two workarounds here. The first is to remove all stranded snapshots from a PV/FCD before deleting it. If you forget to do this step before the delete, there is another workaround using govc, a vSphere CLI built on top of govmomi and the Managed Object Browser, MOB. This procedure is what I am going to describe here in this post.

Step 1: Get the list of CNS Volumes

$ govc volume.ls
c2926d76-6e3e-4bf0-963f-edbf3f3fe2ef pvc-8937353d-fef7-4134-a539-9f6a2af04f79
cbc68363-2bf5-49f0-b897-1f763a80b544 pvc-deacf1d6-70dd-4a9d-88dd-4134576505bf
91889388-dcf9-4981-8ede-1d0a0d472434 pvc-5ac8ae52-dc86-44be-9ba7-9f190fd2f60b
fe908257-db72-4457-af97-25a89bdd12aa pvc-e929d81b-6549-4bd4-9e16-bc9842370f83
1456e423-4800-4acb-9739-9e6e05237a03 pvc-acc2af45-14ab-4f9d-9e68-ad2e3622fce2
430eb939-0784-4137-bca5-1a3fe448e85a pvc-ca1251ba-1033-407d-97a1-79d675018017
b21b6159-3799-4094-be99-7b1bcb1396cf pvc-077523ec-06d5-4279-a890-2f4a2fb3f831
2d1f24f8-9e2c-408c-8e34-e5573b829ab9 pvc-9d7cf414-72cf-4c81-affe-9e7556d1ca3d
7beef13e-4ddf-4929-8228-c7355492dd3b pvc-6189a1ff-5b55-42c2-a202-e09d82a9ec29
82cdc9c2-0104-4856-be4f-a87609ea21fd pvc-07c9ca2c-9e15-4b6c-b4c6-0623ddd4660b
874f43ca-8427-43f5-b855-eb59436dbe2e pvc-7215923e-db14-4ef0-8678-2441827438cb
d759eb26-2207-40c9-9e8d-4a90653c50e9 pvc-c980e276-4141-4827-9086-8c72b2a2a274
b4342b03-3973-460f-92b7-d144c415f680 pvc-d73db09c-6aa0-4a39-b783-2deeefe9cc7a
14472636-6f62-4fb7-8151-4ccf12618bf6 pvc-355afb3c-90b8-4288-87f9-730c5492731e
5df76251-c64a-4e1a-abfa-6d46a27a87c6 pvc-1cf3728e-7b3a-48d8-9985-d2e4d2e23a6a
51618eba-406c-4d18-82ce-94f6cfd6923f pvc-48e2f5e6-e3f5-4ae1-a25a-cc45894d2117
7beb1834-48c0-41c6-acba-7ba173fe97e3 pvc-de504d54-7d43-4ea9-a113-00336d8bddef

Step 2: Get the list of FCDs

$ govc disk.ls | grep pvc
14472636-6f62-4fb7-8151-4ccf12618bf6 pvc-355afb3c-90b8-4288-87f9-730c5492731e
1456e423-4800-4acb-9739-9e6e05237a03 pvc-acc2af45-14ab-4f9d-9e68-ad2e3622fce2
1746bda6-8ed2-413c-954a-b4cc3d2c0022 pvc-85868bc3-20a4-4e2b-9de2-946ca740f363
19768dd7-9e9a-4f53-9c61-58a945d04da1 pvc-094b1257-e1d0-4b04-9f40-be6851e2fa22
1aafbbc7-9007-4650-8d70-5a942bdd83fb pvc-08279cb5-deaf-4f13-813e-760fb0f68ef1
1f26c921-85d3-49fc-b315-64ba0beead3c pvc-56b7f6ce-8cb7-4663-9304-82538b38de26
2d1f24f8-9e2c-408c-8e34-e5573b829ab9 pvc-9d7cf414-72cf-4c81-affe-9e7556d1ca3d
430eb939-0784-4137-bca5-1a3fe448e85a pvc-ca1251ba-1033-407d-97a1-79d675018017
449d95c3-956c-4fff-83a3-0c2516fb0c54 pvc-8a4f1a88-0d32-4ca4-aa5d-10059fd7785e
51618eba-406c-4d18-82ce-94f6cfd6923f pvc-48e2f5e6-e3f5-4ae1-a25a-cc45894d2117
5df76251-c64a-4e1a-abfa-6d46a27a87c6 pvc-1cf3728e-7b3a-48d8-9985-d2e4d2e23a6a
7beb1834-48c0-41c6-acba-7ba173fe97e3 pvc-de504d54-7d43-4ea9-a113-00336d8bddef
7beef13e-4ddf-4929-8228-c7355492dd3b pvc-6189a1ff-5b55-42c2-a202-e09d82a9ec29
82cdc9c2-0104-4856-be4f-a87609ea21fd pvc-07c9ca2c-9e15-4b6c-b4c6-0623ddd4660b
874f43ca-8427-43f5-b855-eb59436dbe2e pvc-7215923e-db14-4ef0-8678-2441827438cb
91889388-dcf9-4981-8ede-1d0a0d472434 pvc-5ac8ae52-dc86-44be-9ba7-9f190fd2f60b
b21b6159-3799-4094-be99-7b1bcb1396cf pvc-077523ec-06d5-4279-a890-2f4a2fb3f831
b4342b03-3973-460f-92b7-d144c415f680 pvc-d73db09c-6aa0-4a39-b783-2deeefe9cc7a
c2926d76-6e3e-4bf0-963f-edbf3f3fe2ef pvc-8937353d-fef7-4134-a539-9f6a2af04f79
cbc68363-2bf5-49f0-b897-1f763a80b544 pvc-deacf1d6-70dd-4a9d-88dd-4134576505bf
d220ee26-5eec-4083-83e2-559e1067eddb pvc-6a50d119-f15e-4b72-9269-34c453731779
d759eb26-2207-40c9-9e8d-4a90653c50e9 pvc-c980e276-4141-4827-9086-8c72b2a2a274
d787bd5c-c9b4-46b2-bebe-66e274635cb5 pvc-f7d3495e-a59e-44fb-b0ab-b9949824d38c
fe908257-db72-4457-af97-25a89bdd12aa pvc-e929d81b-6549-4bd4-9e16-bc9842370f83

We can clearly see that there are more FCDs that there are CNS PVs. The additional FCDs are the ones which cannot be deleted due to the presence of a snapshot, i.e the CNS volume was deleted but the backing FCD was not. There may be other stranded volumes which are not due to the snapshot issue, so the next command I use gets the list of FCDs (on the vSAN datastore only), and then checks to see if is a corresponding valid CNS volume exists. If there is no corresponding CNS volume, the script displays if there is a snapshot associated with the FCD.

Step 3: List the snapshots associated with stranded FCD volumes

$ for i in `govc disk.ls -ds vSANDatastore | grep pvc | awk '{print $1}'`; do \
govc volume.ls -ds vSANDatastore | grep -q $i; if [ $? -ne 0 ]; then \
echo Did not find $i in volume list; govc disk.snapshot.ls $i; fi; done
Did not find 1746bda6-8ed2-413c-954a-b4cc3d2c0022 in volume list
Did not find 19768dd7-9e9a-4f53-9c61-58a945d04da1 in volume list
Did not find 1aafbbc7-9007-4650-8d70-5a942bdd83fb in volume list
3f7f80bc-c718-423d-a6db-265ca9d2bd81 AstrolabeSnapshot
e4e092ca-df17-4834-bb23-1848dd24f7ef AstrolabeSnapshot
Did not find 1f26c921-85d3-49fc-b315-64ba0beead3c in volume list
1a5fdb68-ef51-4d01-b7da-de40424e23d5 AstrolabeSnapshot
1e85c956-1846-45c7-89ab-452c84ecb467 AstrolabeSnapshot
Did not find 449d95c3-956c-4fff-83a3-0c2516fb0c54 in volume list
Did not find d220ee26-5eec-4083-83e2-559e1067eddb in volume list
Did not find d787bd5c-c9b4-46b2-bebe-66e274635cb5 in volume list
f027d3db-b0c8-4c80-b824-fd0510604266 AstrolabeSnapshot
e500c09e-7d13-466b-addc-8ea524ab57e5 AstrolabeSnapshot

So in the list of 7 stranded FCDs, 3 of them have snapshots associated with them. The “Did not find” statement says that this volume did not have a matching CNS entry, so it is stranded. The script then reports any snapshot ids that are on the volume. Note that the AstrolabeSnapshot is the snapshot name used by the Velero vSphere Plugin. Now that we have clearly identified the stranded FCDs with Velero snapshots, so we can now use both the disk id and the snapshot id from the above list to delete the snapshot using the vSphere MOB (Managed Object Reference).

Step 4: Use the vSphere MOB to delete the snapshot

Use the DeleteSnapshot FCD API in MOB to delete the snapshots. Here is the URL to use:

https://<vc-ip>/vslm/mob//?moid=VStorageObjectManager&method=VslmDeleteSnapshot_Task

Populate the disk id and the snapshot id from the previous output, and this should successfully remove the snapshot from the FCD, once the method is invoked.

Once the snapshots have been deleted, the orphaned or stranded FCDs can now be successfully removed. I used the following script to do this, but if you are not comfortable with scripting it, simply run the govc disk.rm command against each stranded volume’s id. This script again looks for FCDs that do not have a corresponding CNS volume, decides that it is a stranded volume, and deletes it.

$ for i in `govc disk.ls -ds vSANDatastore | grep pvc | awk '{print $1}'`; do \
govc volume.ls -ds vSANDatastore | grep -q $i; if [ $? -ne 0 ]; then \
echo Did not find $i in volume list; govc disk.rm $i; fi; done
Did not find 1746bda6-8ed2-413c-954a-b4cc3d2c0022 in volume list
[22-03-21 09:07:18] Deleting 1746bda6-8ed2-413c-954a-b4cc3d2c0022...OK
Did not find 19768dd7-9e9a-4f53-9c61-58a945d04da1 in volume list
[22-03-21 09:07:22] Deleting 19768dd7-9e9a-4f53-9c61-58a945d04da1...OK
Did not find 449d95c3-956c-4fff-83a3-0c2516fb0c54 in volume list
[22-03-21 09:07:23] Deleting 449d95c3-956c-4fff-83a3-0c2516fb0c54...OK
Did not find d220ee26-5eec-4083-83e2-559e1067eddb in volume list
[22-03-21 09:07:28] Deleting d220ee26-5eec-4083-83e2-559e1067eddb...OK
Did not find d787bd5c-c9b4-46b2-bebe-66e274635cb5 in volume list
[22-03-21 09:07:28] Deleting d787bd5c-c9b4-46b2-bebe-66e274635cb5...OK

Everything should now be cleaned up and the “A specified parameter was not correct” errors that were appearing every 5 minutes in the vSphere UI for all ‘deleted’ PVs with snapshots should cease appearing. And before you ask, yes, we are working behind the scenes to handle this scenario in a more elegant way in the product.

Note: I used the term FCD to describe the First Class Disk objects. These also appear in the documentation as IVD, or Improved Virtual Disks.

More details around govc can be found here.