Site icon CormacHogan.com

VSAN 6.2 Part 10 – Problematic Disk Handling

In this post, I want to talk about a feature called Problematic Disk Handling. Some history behind why we have such a feature can be found in this post. In VSAN 6.2/vSphere 6.0 U2, Problematic Disk Handling has been improved so that it will unmount a problematic disk/diskgroup for two reasons:

        esxcfg-advcfg --set 1 /LSOM/lsomSlowTier1DeviceUnmount

Troubleshooting

In general, there are a few things to look for to figure out if problematic disk handling has kicked in:

2016-02-10T10:10:51.481Z cpu6:43298)WARNING: LSOM: LSOMEventNotify:6440: 
Virtual SAN device 52db4996-ffdd-9957-485c-e2dcf1057f66 is under 
permanent error.
.
.
2016-02-10T10:17:53.238Z cpu14:3443764)VSAN Device Monitor: Successfully
unmounted failed VSAN diskgroup naa.600508b1001cbbbe903bd48c8f6b2ddb
 event.Unmounting failed VSAN diskgroup
eventTypeId = "Re-mounting failed VSAN diskgroup 
naa.600508b1001cbbbe903bd48c8f6b2ddb.",

A reference to “failed” as opposed to “unhealthy” in the vmkernel.log message indicates that LSOM detected that the disk failed (the second scenario from the list above).  The reference to “diskgroup” in the same log message indicates that the entire diskgroup is being unmounted as opposed to a single capacity tier disk. Note that this will be the case when (a) the disk that failed in LSOM is the cache device of the disk group or (b) this is a diskgroup on an all-flash VSAN with deduplication has been enabled (thus a disk failure impacts the whole of the disk group).

Exit mobile version