A list of ESXCLI storage commands you can’t live without

There are many occasions where the information displayed in the vSphere client is not sufficient to display all relevant information about a particular storage device, or indeed to troubleshoot problems related to a storage device. The purpose of this post is to explain some of the most often used ESXCLI commands that I use when trying to determine storage device information, and to troubleshoot a particular device.

Which PSA, SATP & PSP?

First things first, lets figure out is the device is managed by VMware’s native multipath plugin, the NMP. Or indeed is it managed by a third-party plugin, such as EMC’s PowerPath? I start with the esxcli storage nmp device list command. This not only confirms that the device is managed by NMP, but will also display the Storage Array Type Plugin (SATP) for path failover and the Path Selection Policy (PSP) for load balancing. Here is an example of this command (I’m using the -d option to run it against one device to keep the output to a minimum).

~ # esxcli storage nmp device list -d naa.600601603aa029002cedc7f8b356e311
naa.600601603aa029002cedc7f8b356e311
  Device Display Name: DGC Fibre Channel Disk (naa.600601603aa029002cedc7f8b356e311)
  Storage Array Type: VMW_SATP_ALUA_CX
  Storage Array Type Device Config: {navireg=on, ipfilter=on}
   {implicit_support=on;explicit_support=on; explicit_allow=on;
   alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
  Path Selection Policy: VMW_PSP_RR
  Path Selection Policy Device Config: {policy=rr,iops=1000,
   bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0,
   numBytesPending=0}
  Path Selection Policy Device Custom Config:
  Working Paths: vmhba2:C0:T3:L100
  Is Local SAS Device: false
  Is Boot USB Device: false
~ #

Clearly we can see both the SATP and the PSP for the device in this output. There is a lot more information here as well, especially since this is an ALUA array. You can read more about what these configuration options mean in this post. This device is using the Round Robin PSP, VMW_PSP_RR. One interesting fact even now is the support for Round Robin PSP; some arrays support it and some do not. It is always worth checking the footnotes of the VMware HCL Storage section to see if a particular array supports Round Robin. Now that we have the NMP, SATP & PSP, let’s look at some other details.

Queue Depth, Adaptive Queuing, Reservations

This next command is very useful for checking a number of things. Primarily, it will tell you what the device queue depth is set to. But it will also tell you if adaptive queuing has been configured, and if the device has for a perennially reserved setting, something that is used a lot in Microsoft Clustering configurations to avoid slow boots.

~ # esxcli storage core  device list -d naa.600601603aa029002cedc7f8b356e311
naa.600601603aa029002cedc7f8b356e311
   Display Name: DGC Fibre Channel Disk (naa.600601603aa029002cedc7f8b356e311)
   Has Settable Display Name: true
   Size: 25600
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.600601603aa029002cedc7f8b356e311
   Vendor: DGC    
   Model: VRAID          
   Revision: 0532
   SCSI Level: 4
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters: VAAI_FILTER
   VAAI Status: supported
   Other UIDs: vml.0200640000600601603aa029002cedc7f8b356e311565241494420
   Is Local SAS Device: false
   Is Boot USB Device: false
   No of outstanding IOs with competing worlds: 32

The last line out output is actually the device queue depth. For this device, 32 I/Os can be queued to the device. The Queue Full Sample Size and the Queue Full threshold both related to Adaptive Queuing – it is not configured on this device since both values are 0. If you’d like to know more about Adaptive Queuing, you can read this article here.

The perennially reserved flag is an interesting one and a relatively recent addition to device configurations. With applications that place SCSI reservations on devices (such as Microsoft Cluster), ESXi host reboots would be delayed as it tried to query devices with SCSI reservation on them. Perennially Reserved is a flag to tell the ESXi hosts not to waste any time trying to query these devices on boot as there is a likelihood that they are reserved by another host. This therefore speeds up the boot times of the ESXi hosts running MSCS VMs.

For those of you contemplating VSAN, VMware’s new Virtual SAN product, the ability to identify SSD devices and local vs. remote devices is critical. VSAN required SSD (or PCIe flash devices) as well  as local magnetic disks. This command will help you identify both.

Apart from some vendor specific information and size information, another interesting item is the VAAI Status. In this case, VAAI (vSphere APIs for Array integration) is shown as supported. But how can I find out more information about which primitives are supported? This next command will help with that.

Which VAAI primitives are supported?

~ # esxcli storage core device vaai status get -d naa.600601603aa029002cedc7f8b356e311
naa.600601603aa029002cedc7f8b356e311
   VAAI Plugin Name: VMW_VAAIP_CX
   ATS Status: supported
   Clone Status: supported
   Zero Status: supported
   Delete Status: unsupported

This device, as we can clearly see, supports 3 out of the 4 VAAI block primitives. ATS, Atomic Test & Set, is the replacement for SCSI reservations. Clone is the ability to offload a clone or migration operation to the array using XCOPY. Zero is the ability to have the array to zero out blocks using WRITE_SAME. Delete relates to the UNMAP primitive, and is the ability to reclaim dead space on thin provisioned datastores. In this example, the primitives shows up as unsupported.

 Useful protocol information

For those of you interested in troubleshooting storage issues outside of ESXi, the esxcli storage san namespace has some very useful commands. In the case of fiber channel you can get information about which adapters are used for FC, and display the WWNN (nodename) and WWPN (portname) information, speed and port state as shown here.

~ # esxcli storage san fc list
   Adapter: vmhba2
   Port ID: 012800
   Node Name: 20:00:00:c0:dd:18:77:d1
   Port Name: 21:00:00:c0:dd:18:77:d1
   Speed: 10 Gbps
   Port Type: NPort
   Port State: ONLINE

   Adapter: vmhba3
   Port ID: 000000
   Node Name: 20:00:00:c0:dd:18:77:d3
   Port Name: 21:00:00:c0:dd:18:77:d3
   Speed: 0 Gbps
   Port Type: Unknown
   Port State: LINK DOWN

So I have one good adapter, and one not so good. I can also display FC event information:

~ # esxcli storage san fc events get
FC Event Log                                                
-------------------------------------------------------------
2013-09-23 12:18:58.085 [vmhba2] LINK UP                    
2013-09-23 13:05:35.952 [vmhba2] RSCN received for PID 012c00
2013-09-23 13:29:24.072 [vmhba2] RSCN received for PID 012c00
2013-09-23 13:33:36.249 [vmhba2] RSCN received for PID 012c00

It should be noted that there are a bunch of other useful commands in this name space, not just for FC adapters. You can also examine FCoE, iSCSI and SAS devices in this namespace and get equally useful information.

 Useful SMART Information

Another very useful command, especially since the introduction of vFRC (vSphere Flash Read Cache) and the soon to be announced VSAN, which both support SSD, is the ability to examine the SMART attributes of a disk drive.

~ # esxcli storage core device smart get -d naa.xxxxxx 
Parameter                    Value Threshold Worst 
---------------------------- ----- --------- -----
Health Status                 OK     N/A       N/A
Media Wearout Indicator       N/A    N/A       N/A
Write Error Count             N/A    N/A       N/A
Read Error Count              114    6         100
Power-on Hours                90     0         90
Power Cycle Count             100    20        100
Reallocated Sector Count      2      36        2
Raw Read Error Rate           114    6         100
Drive Temperature             33     0         53
Driver Rated Max Temperature  67     45        47
Write Sectors TOT Count       200    0         200
Read Sectors TOT Count        N/A    N/A       N/A 
Initial Bad Block Count       100    99        10

While there are still some drives that returns certain fields as N/A, I know there is a concerted effort underway between VMware and its partners to get this working as much as possible. It is invaluable to be able to see the media wear out Indicator on SSDs, as well as reallocated sector count and drive temperature.

I hope you find this useful. Hopefully you can see that there are a lot of extremely useful commands available in the ESXCLI. Do you use others? And if so, why? Let me know.

2 comments

Comments are closed.