I was involved in an interesting case recently. It was interesting because the customer was running an 8 node cluster, 4 disk groups per host and 5 x ~900GB hard disks per disk group which should have provided somewhere in the region of 150TB of storage capacity (with a little overhead for metadata). But after some maintenance tasks, the customer was seeing only 100TB approximately on the VSAN datastore.
After some investigation, it became obvious that some of those disks were not contributing their storage in their respective disk groups. One of the main clues was the fact that when the disks were queried from the esxcli vsan storage list command, they were shown as no longer being part of CMMDS, VSAN’s cluster membership and monitoring directory service:
naa.600605b008b04b90ff0000a60a119dd3: Device: naa.600605b008b04b90ff0000a60a119dd3 Display Name: naa.600605b008b04b90ff0000a60a119dd3 Is SSD: false VSAN UUID: 520954bd-c07c-423c-8e42-ff33ca5c0a81 VSAN Disk Group UUID: 52564730-8bc6-e442-2ab9-6de5b0043d87 VSAN Disk Group Name: naa.600605b008b04b90ff0000a80a26f73f Used by this host: true In CMMDS: false Checksum: 15088448381607538692 Checksum OK: true
This explained why the capacity was showing up incorrectly, but it did not explain why VSAN was unable to use the capacity of these disks for the VSAN datastore.
After some additional research, we found that the underlying volumes on the disks were seen as “snapshots” by the ESXi host. This can be verified using the following command:
~ # esxcli storage vmfs snapshot list 54228778-891c4b60-a013-000c29fe01fa Volume Name: test-demo VMFS UUID: 54228778-891c4b60-a013-000c29fe01fa Can mount: true Reason for un-mountability: Can resignature: true Reason for non-resignaturability: Unresolved Extent Count: 1 ~ #
This behaviour can happen for a number of reasons, and is not specific to VSAN. In the past we have seen this issue (local VMFS volumes being reported as snapshots) when customers upgraded controller firmware or replaced storage controllers on the host. When the volume is seen as a snapshot, it will not be mounted by ESXi.
In this particular scenario, the disks were present and correct, but the volumes were not mounted, implying that they could not be included in capacity calculations.
Later on in this case it was discovered that the maintenance activity at the customer site involved a number of changes to the servers, including the replacement of a motherboard in one of the servers.
Once this root cause was confirmed, the volumes were mounted by using the command esxcli storage vmfs snapshot mount -u . This resolved the issue, allowed the volumes to be mounted and brought the VSAN datastore back to full capacity.
If you find discrepancies between the available physical capacity per host and the VSAN datastore capacity, check that all the disks status display In CMMDS: true. If any are shown in the false state, check if the ESXi host is mounting the volumes correctly (and not seeing the volumes as snapshots) using the above commands.
A KB article is being created to outline the correct steps to follow if this situation arises. in the meantime GSS can be contacted for further assistance.