I’ve noticed a couple of customers experiencing a Component Metadata Health failure on the VSAN health check recently. This is typically what it looks like:
Note: This health check test can fail intermittently if the destaging process is slow, most likely because VSAN needs to do physical block allocations on the storage devices. To work around this issue, run the health check once more after the period of high activity (multiple virtual machine deployments, etc) is complete. If the health check continues to fail the warning is valid. If the health check passes, the warning can be ignored.
With that in mind, let’s continue to figure out which disk has the potentially problematic component. The warning above reports a component UUID, but customers are having difficulty matching this UUID to a physical device. In other words, on which physical disk does this component reside? The only way to locate this currently is through the RVC, Ruby vSphere Console. The following is an example on how you can locate the physical device on which a component of an object resides.
First, using vsan.cmmds_find, search on the component UUID as reported in the health check (components with errors) to get the disk UUID. Some of the preceding columns have been removed for readability, and the command is run against the cluster object (represented by 0):
> vsan.cmmds_find 0 -u dc3ae056-0c5d-1568-8299-a0369f56ddc0 ---+---------+-----------------------------------------------------------+ | Health | Content | ---+---------+-----------------------------------------------------------+ | Healthy | {"diskUuid"=>"52e5ec68-00f5-04d6-a776-f28238309453", | | | "compositeUuid"=>"92559d56-1240-e692-08f3-a0369f56ddc0", | | "capacityUsed"=>167772160, | | | "physCapacityUsed"=>167772160, | | | "dedupUniquenessMetric"=>0, | | | "formatVersion"=>1} | ---+---------+-----------------------------------------------------------+ /localhost/Cork-Datacenter/computers>
Now that you have the diskUuid, you can use that in the next command. Once more, some of the preceding columns in the output have been removed for readbility:
> vsan.cmmds_find 0 -t DISK -u 52e5ec68-00f5-04d6-a776-f28238309453
---+---------+-------------------------------------------------------+
| Health | Content |
---+---------+-------------------------------------------------------+
| Healthy | {"capacity"=>145303273472, |
| | "iops"=>100, |
| | "iopsWritePenalty"=>10000000, |
| | "throughput"=>200000000, |
| | "throughputWritePenalty"=>0, |
| | "latency"=>3400000, |
| | "latencyDeviation"=>0, |
| | "reliabilityBase"=>10, |
| | "reliabilityExponent"=>15, |
| | "mtbf"=>1600000, |
| | "l2CacheCapacity"=>0, |
| | "l1CacheCapacity"=>16777216, |
| | "isSsd"=>0, |
| | "ssdUuid"=>"52bbb266-3a4e-f93a-9a2c-9a91c066a31e", |
| | "volumeName"=>"NA", |
| | "formatVersion"=>"3", |
| | "devName"=>"naa.600508b1001c5c0b1ac1fac2ff96c2b2:2", |
| | "ssdCapacity"=>0, |
| | "rdtMuxGroup"=>80011761497760, |
| | "isAllFlash"=>0, |
| | "maxComponents"=>47661, |
| | "logicalCapacity"=>0, |
| | "physDiskCapacity"=>0, |
| | "dedupScope"=>0} |
---+---------+-------------------------------------------------------+
>
In the devName field above, you now have the NAA id (the SCSI id) of the disk.
I’ve requested that this information get added to the KB article.