Pluggable Storage Architecture (PSA) Deep-Dive – Part 3
So far in this series, we have looked at the Pluggable Storage Architecture (PSA) and MPPs (Multipath Plugins). We have delved into the Native Multipath Plugin (NMP), and had a look at its sub-plugins, the Storage Array Type Plugin (SATP) and Path Selection Plugin (PSP). We have seen how the PSA selects an MPP, and if that MPP is the NMP, how the NMP selects an SATP and PSP.
Note – if you are having trouble following all the acronyms, you are not the first. There is a glossary at the end of the first blog post. And if we haven’t had enough acronyms, you will more recently see the plugins referred to as an MEMs, Management Extension Modules.
However, these names never really caught on and the original names continue to be the ones which are commonly used. The next step is to examine the SATP and PSP in more detail.
Storage Array Type Plugin (SATP)
The role of the SATP can be thought of as falling into three distinct areas. The first task of the SATP is to monitor the hardware state of the physical paths to the storage array. The second task of the SATP is to detect when a hardware component of a physical path has failed. This is detected in the form of SCSI sense codes returned by the array controller to the host. (KB article 1003433 details the various sense codes that can initiate a path fail-over). The final task is to switch the physical path to the array when the currently active path has failed.
If an I/O operation reports an error, NMP calls an appropriate SATP. The SATP interprets the error codes and, when appropriate, activates inactive paths and fails over to the new active path.
Path Selection Plugin (PSP)
A PSP handles load balancing operations and is responsible for choosing a physical path to issue an I/O request to a logical device. When a Virtual Machine issues an I/O request to a storage device managed by the NMP, it calls the PSP assigned to this storage device. The PSP selects an appropriate physical path on which to send the I/O, load balancing the I/O if necessary. I posted an article (which includes a link to a video) on the vSphere Storage Blog which shows how the SATP & PSP interact if a path failure occurs.
As highlighted previously, there are three default PSPs shipped with ESXi.
VMW_PSP_MRU — MRU stands for Most Recently Used. This PSP selects the first working path discovered at system boot time. If this path becomes unavailable, the ESX host switches to an alternative path and continues to use the new path while it is available. This is the default PSP used with Active/Passive arrays. A/P arrays are arrays which have multiple controllers, but only a single controller has ownership of the LUN at any one time. This means that the LUN is only ever visible on paths to one controller. In certain failure scenarios, the LUN ownership may have to move to another controller (referred to as a trespass by some array vendors). This fail-over between controllers can take some time to complete, depending on how busy the storage array is. In a misconfigured environment, the ownership of the LUN can continuously move between array controllers. This behavior is referred to as path thrashing, and can have serious performance implications for the ESXi host.
VMW_PSP_Fixed — Uses the designated preferred path, if it has been configured. Otherwise, it uses the first working path discovered at system boot time. If the ESXi host cannot use the preferred path (because of a path failure, for instance), this PSP selects a random alternative available path. The ESXi host automatically reverts back to the preferred path as soon as the path becomes available. Typically used with Active/Active arrays. A/A arrays are able to present the same LUN on multiple controllers at the same time.
VMW_PSP_RR – RR stands for Round Robin. It uses an automatic path selection rotating through all available paths and enabling load balancing across the paths. While this PSP can be used on both A/A arrays and A/P arrays, it is most typically found on A/A arrays since all paths to the LUN can be used in load balancing the I/O. On A/P arrays, only paths to the controller which is currently the LUN owner are used.What we haven’t discussed here is how PSP handles Asymmetric Logical Unit Access (ALUA) arrays. This will be covered in a future post.
As we have already seen, SATPs have a default PSP. However, it is supported to use other PSPs other than the default. A common scenario is for customers to move from Fixed to Round Robin. However, there has been a long-standing directive around Round Robin that you should discuss any changes to the PSP with your storage array vendor before implementing the change. For EMC customers, this is not necessary. Since 5.1, EMC have introduced Round Robin as the default path policy for their arrays. I posted about it here.
Now a number of alternate, partner specific PSPs also exist. For instance, DELL have had one for their EqualLogic arrays since vSphere 5.0. More recently, Nimble Storage introduced a PSP for their storage arrays.
That completes part 3 the deep-dive into the PSA. I hope that has given you some idea how the various components of the PSA are used, and why we chose to go with this direction for I/O device and path management.
Disclosure – EMCer here…
Cormac – great post as always. One note for EMC customers out there. (and there are a TON of VNX+vSphere customers out there – thank you!).
If you’re using a VNX (which falls into the VMware category of a “A/P” array that supports ALUA) with the most current VNX OE software version in ALUA mode, and are using the most current vSphere release, we worked to make the RR PSP the default (via the SATP selection as Cormac explained in this series). Nice and simple – behind the scenes 🙂
Interestingly, future VNX releases **may** (wink wink nudge nudge) be fully A/A.
If you want to twiddle with the PSP, the free vCenter plugin can do this (for all EMC arrays) across your environment. Still working on the update to support the new web client. There are also scripts for all this on “Everything VMware at EMC” (just google it).
Great info – thanks Chad!
Might be a topic for another time, but in regards to Round Robin the traffic is load balanced for every 1000 I/Os by default. This can be changed (a topic highly debated a couple years ago), but should not need to be.
For those interested the command is: esxcli storage nmp psp roundrobin deviceconfig set -d –iops –type iops
Indeed – good call Steve, and something which might be worth revisiting. My colleague Duncan Epping did some coverage on it over on – http://www.yellow-bricks.com/2010/03/30/whats-the-point-of-setting-iops1/ – and essentially the bottom line is that RR works better the greater the number of datastores and VMs you have.
Hi – I have been looking at this feature recently and found that it can cause a problem if you are using clustering MSCS (see VMware KB article KB: 1010041).
There are a couple of options as I see it, (1) manually change the path policy to suit your array as per the KB article and then check this on a regular basis in case it defaults back (eg after presenting new LUNs and rescanning) or (2) Adopt the use of a vendor specific path management plugin eg for EMC Powerpath for VMware. Interested to hear your thoughts on the mattter, for me the risk to I/O in a production environment is a major concern.
Yes indeed. That was an oversight on my behalf and I should have called that out as a caveat to using RR. The problem is to do with SCSI Reservations being path specific, so we need a way to remove SCSI reservations on any path to allow RR to be used with MSCS. All I can say is that we are working on a solution to that, and I’ll hopefully be able to share something around this very soon.