vSphere 5.1 Storage Enhancements – Part 6: IODM & SSD Monitoring

To build on 5.0 enhancements to make the life of a vSphere administrator easier from a storage perspective, vSphere 5.1 includes additional command for the diagnosis of various storage protocol issues from the ESXi host. This new functionality is called I/O Device Management (IODM).

This new namespace of esxcli commands includes Fibre Channel, FCoE, iSCSI, SAS Protocol Statistics as well as SMART (Self Monitoring, Analysis And Reporting Technology) attributes.  The aim is to allow administrator determine if a storage issue is occurring at the ESXi, HBA, Fabric and Storage Port level.  The commands will enable an admin to look at critical events like frame loss, as well as initiate various resets of the storage infrastructure. The SMART  features are very useful as it allows insight into SAS and SATA SSD status, such as the current Wear Leveling state of a drive.

Advanced I/O Device Management – esxcli storage san
There are a number of new namespaces in the 5.1 version of esxcli. There is also a new VMkernel module that instrumented drivers can call into, which includes event caching information.

For example, link down and link up messages from Fiber Channel are logged.The fc (fibre channel) namespace also includes an option to perform a LIP (Loop Initiation Primitive) Reset to a given FC adapter on the system. These esxcli commands will also be hooked into vm-support.

Probably one of the nicest parts of this feature is the ability to examine various adapter statistics. This should really assist when trying to troubleshoot storage issues from a vSphere perspective. Here we can see the statistics returned by IODM for a Software iSCSI initiator on an ESXi host. Information such as the number of connections and sessions can help troubleshoot port binding and multipathing configurations on the hosts, and the amount of I/O plus the different types of Protocol Data Units (PDUs) are displayed in a very clear way.

This is a very useful thing to have when trying to monitor your iSCSI infrastructure.

SSD Monitoring
As SSD disks become more prevalent, it is important to be able to monitor them from an ESXi host. VMware is providing a module which will monitor a number of different SSD attributes. This includes the Media Wearout indicator, as well as the temperature & Reallocated Sector Count. The reserved sector count should be about 100, but when the disk surface has issues, SSD allocates sectors from reserved sectors. When these goes to zero, we could start getting sector errors on the SSD, so we need to be aware of any use of the reallocated sectors.

To look at the SSD attributes, the following esxcli command can be used:

esxcli storage core device smart get -d naa.xxxxxx

What we see here is the output of a number of different SSD attributes, including the three mentioned previously.

SSD Monitoring

The plug-ins will live on the ESXi host in the directory /usr/lib/VMware/smart_plugins. VMware is providing a generic SMARTS plugin in 5.1, but disk vendors can provide their own smart plug-in for additional information.

Smartd is the SMART daemon on the ESXi 5.1 host. It runs every half hour & makes API calls to gather useful diagnostic information from the drives. These events and statistics will not be surfaced up into vCenter in vSphere 5.1. They will only be viewable via the esxcli command line. Although the primary use case is for SSD, the esxcli commands can also be run against HDD to gather certain information.

A script called smartinfo.sh gathers statistics from all disks, SSD or not. This information will also be included in the vm-support log gathering utility output.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan

  1. All i get is an error:

    > esxcli storage core device smart get -d naa.5001517bb2a06869

    Error getting Smart Parameters: GET param bundle error

      • I have tried with two devices:
        ~ # esxcli storage core device list
        Model: ST1000DM003-9YN1
        Model: INTEL SSDSC2CT12

        I know both is presenting SMART attributes when attached to a Windows 7 system.

        Could it be the controller ?
        ~ # esxcli storage core adapter list
        vmhba1 rste link-n/a pscsi.vmhba1 (0:3:0.0) Intel Corporation Patsburg Dual 4-Port SATA/SAS Storage Control Unit

        My mission was to check if we can use an desktop/enterprise SSD drives for temp data in some special VM’s, but we have not seen that as an option before because we have not been able to predict when the SSD drives were weared out.

      • It was the controller, when i attached the same drives via the standard AHCI controller, it works much better.

        Now i will see if i can wear out some SSD drives, and actually get some values to change, i fear desktop SSD drives is not reporting Media Wearout in real time, as in “the value is only updated on power cycle”.

        This is my readout on a Intel 330 120 GB drive, the only candidate that i think i can actually wear out in the timeframe i have for testing.

        Parameter Value Threshold Worst
        —————————- —– ——— —–
        Media Wearout Indicator 100 0 100
        Power-on Hours 0 0 0
        Power Cycle Count 100 0 100
        Reallocated Sector Count 100 0 100

        If you want the results you are welcome to shoot me an email.

Comments are closed.