A list of ESXCLI storage commands you can’t live without
There are many occasions where the information displayed in the vSphere client is not sufficient to display all relevant information about a particular storage device, or indeed to troubleshoot problems related to a storage device. The purpose of this post is to explain some of the most often used ESXCLI commands that I use when trying to determine storage device information, and to troubleshoot a particular device.
Which PSA, SATP & PSP?
First things first, lets figure out is the device is managed by VMware’s native multipath plugin, the NMP. Or indeed is it managed by a third-party plugin, such as EMC’s PowerPath? I start with the esxcli storage nmp device list command. This not only confirms that the device is managed by NMP, but will also display the Storage Array Type Plugin (SATP) for path failover and the Path Selection Policy (PSP) for load balancing. Here is an example of this command (I’m using the -d option to run it against one device to keep the output to a minimum).
~ # esxcli storage nmp device list -d naa.600601603aa029002cedc7f8b356e311 naa.600601603aa029002cedc7f8b356e311 Device Display Name: DGC Fibre Channel Disk (naa.600601603aa029002cedc7f8b356e311) Storage Array Type: VMW_SATP_ALUA_CX Storage Array Type Device Config: {navireg=on, ipfilter=on} {implicit_support=on;explicit_support=on; explicit_allow=on; alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}} Path Selection Policy: VMW_PSP_RR Path Selection Policy Device Config: {policy=rr,iops=1000, bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0, numBytesPending=0} Path Selection Policy Device Custom Config: Working Paths: vmhba2:C0:T3:L100 Is Local SAS Device: false Is Boot USB Device: false ~ #
Clearly we can see both the SATP and the PSP for the device in this output. There is a lot more information here as well, especially since this is an ALUA array. You can read more about what these configuration options mean in this post. This device is using the Round Robin PSP, VMW_PSP_RR. One interesting fact even now is the support for Round Robin PSP; some arrays support it and some do not. It is always worth checking the footnotes of the VMware HCL Storage section to see if a particular array supports Round Robin. Now that we have the NMP, SATP & PSP, let’s look at some other details.
Queue Depth, Adaptive Queuing, Reservations
This next command is very useful for checking a number of things. Primarily, it will tell you what the device queue depth is set to. But it will also tell you if adaptive queuing has been configured, and if the device has for a perennially reserved setting, something that is used a lot in Microsoft Clustering configurations to avoid slow boots.
~ # esxcli storage core device list -d naa.600601603aa029002cedc7f8b356e311 naa.600601603aa029002cedc7f8b356e311 Display Name: DGC Fibre Channel Disk (naa.600601603aa029002cedc7f8b356e311) Has Settable Display Name: true Size: 25600 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/naa.600601603aa029002cedc7f8b356e311 Vendor: DGC Model: VRAID Revision: 0532 SCSI Level: 4 Is Pseudo: false Status: on Is RDM Capable: true Is Local: false Is Removable: false Is SSD: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: unknown Attached Filters: VAAI_FILTER VAAI Status: supported Other UIDs: vml.0200640000600601603aa029002cedc7f8b356e311565241494420 Is Local SAS Device: false Is Boot USB Device: false No of outstanding IOs with competing worlds: 32
The last line out output is actually the device queue depth. For this device, 32 I/Os can be queued to the device. The Queue Full Sample Size and the Queue Full threshold both related to Adaptive Queuing – it is not configured on this device since both values are 0. If you’d like to know more about Adaptive Queuing, you can read this article here.
The perennially reserved flag is an interesting one and a relatively recent addition to device configurations. With applications that place SCSI reservations on devices (such as Microsoft Cluster), ESXi host reboots would be delayed as it tried to query devices with SCSI reservation on them. Perennially Reserved is a flag to tell the ESXi hosts not to waste any time trying to query these devices on boot as there is a likelihood that they are reserved by another host. This therefore speeds up the boot times of the ESXi hosts running MSCS VMs.
For those of you contemplating VSAN, VMware’s new Virtual SAN product, the ability to identify SSD devices and local vs. remote devices is critical. VSAN required SSD (or PCIe flash devices) as well as local magnetic disks. This command will help you identify both.
Apart from some vendor specific information and size information, another interesting item is the VAAI Status. In this case, VAAI (vSphere APIs for Array integration) is shown as supported. But how can I find out more information about which primitives are supported? This next command will help with that.
Which VAAI primitives are supported?
~ # esxcli storage core device vaai status get -d naa.600601603aa029002cedc7f8b356e311 naa.600601603aa029002cedc7f8b356e311 VAAI Plugin Name: VMW_VAAIP_CX ATS Status: supported Clone Status: supported Zero Status: supported Delete Status: unsupported
This device, as we can clearly see, supports 3 out of the 4 VAAI block primitives. ATS, Atomic Test & Set, is the replacement for SCSI reservations. Clone is the ability to offload a clone or migration operation to the array using XCOPY. Zero is the ability to have the array to zero out blocks using WRITE_SAME. Delete relates to the UNMAP primitive, and is the ability to reclaim dead space on thin provisioned datastores. In this example, the primitives shows up as unsupported.
Useful protocol information
For those of you interested in troubleshooting storage issues outside of ESXi, the esxcli storage san namespace has some very useful commands. In the case of fiber channel you can get information about which adapters are used for FC, and display the WWNN (nodename) and WWPN (portname) information, speed and port state as shown here.
~ # esxcli storage san fc list Adapter: vmhba2 Port ID: 012800 Node Name: 20:00:00:c0:dd:18:77:d1 Port Name: 21:00:00:c0:dd:18:77:d1 Speed: 10 Gbps Port Type: NPort Port State: ONLINE Adapter: vmhba3 Port ID: 000000 Node Name: 20:00:00:c0:dd:18:77:d3 Port Name: 21:00:00:c0:dd:18:77:d3 Speed: 0 Gbps Port Type: Unknown Port State: LINK DOWN
So I have one good adapter, and one not so good. I can also display FC event information:
~ # esxcli storage san fc events get FC Event Log ------------------------------------------------------------- 2013-09-23 12:18:58.085 [vmhba2] LINK UP 2013-09-23 13:05:35.952 [vmhba2] RSCN received for PID 012c00 2013-09-23 13:29:24.072 [vmhba2] RSCN received for PID 012c00 2013-09-23 13:33:36.249 [vmhba2] RSCN received for PID 012c00
It should be noted that there are a bunch of other useful commands in this name space, not just for FC adapters. You can also examine FCoE, iSCSI and SAS devices in this namespace and get equally useful information.
Useful SMART Information
Another very useful command, especially since the introduction of vFRC (vSphere Flash Read Cache) and the soon to be announced VSAN, which both support SSD, is the ability to examine the SMART attributes of a disk drive.
~ # esxcli storage core device smart get -d naa.xxxxxx Parameter Value Threshold Worst ---------------------------- ----- --------- ----- Health Status OK N/A N/A Media Wearout Indicator N/A N/A N/A Write Error Count N/A N/A N/A Read Error Count 114 6 100 Power-on Hours 90 0 90 Power Cycle Count 100 20 100 Reallocated Sector Count 2 36 2 Raw Read Error Rate 114 6 100 Drive Temperature 33 0 53 Driver Rated Max Temperature 67 45 47 Write Sectors TOT Count 200 0 200 Read Sectors TOT Count N/A N/A N/A Initial Bad Block Count 100 99 10
While there are still some drives that returns certain fields as N/A, I know there is a concerted effort underway between VMware and its partners to get this working as much as possible. It is invaluable to be able to see the media wear out Indicator on SSDs, as well as reallocated sector count and drive temperature.
I hope you find this useful. Hopefully you can see that there are a lot of extremely useful commands available in the ESXCLI. Do you use others? And if so, why? Let me know.
Fantastic! Thanks for this information. An absolutely must read!