So far in this series, we have looked at the Pluggable Storage Architecture (PSA) and MPPs (Multipath Plugins). We have delved into the Native Multipath Plugin (NMP), and had a look at its sub-plugins, the Storage Array Type Plugin (SATP) and Path Selection Plugin (PSP). We have seen how the PSA selects an MPP, and if that MPP is the NMP, how the NMP selects an SATP and PSP.
Note – if you are having trouble following all the acronyms, you are not the first. There is a glossary at the end of the first blog post. And if we haven’t had enough acronyms, you will more recently see the plugins referred to as an MEMs, Management Extension Modules.
Storage Array Type Plugin (SATP)
The role of the SATP can be thought of as falling into three distinct areas. The first task of the SATP is to monitor the hardware state of the physical paths to the storage array. The second task of the SATP is to detect when a hardware component of a physical path has failed. This is detected in the form of SCSI sense codes returned by the array controller to the host. (KB article 1003433 details the various sense codes that can initiate a path fail-over). The final task is to switch the physical path to the array when the currently active path has failed.
If an I/O operation reports an error, NMP calls an appropriate SATP. The SATP interprets the error codes and, when appropriate, activates inactive paths and fails over to the new active path.
Path Selection Plugin (PSP)
A PSP handles load balancing operations and is responsible for choosing a physical path to issue an I/O request to a logical device. When a Virtual Machine issues an I/O request to a storage device managed by the NMP, it calls the PSP assigned to this storage device. The PSP selects an appropriate physical path on which to send the I/O, load balancing the I/O if necessary. I posted an article (which includes a link to a video) on the vSphere Storage Blog which shows how the SATP & PSP interact if a path failure occurs.
As highlighted previously, there are three default PSPs shipped with ESXi.
VMW_PSP_MRU — MRU stands for Most Recently Used. This PSP selects the first working path discovered at system boot time. If this path becomes unavailable, the ESX host switches to an alternative path and continues to use the new path while it is available. This is the default PSP used with Active/Passive arrays. A/P arrays are arrays which have multiple controllers, but only a single controller has ownership of the LUN at any one time. This means that the LUN is only ever visible on paths to one controller. In certain failure scenarios, the LUN ownership may have to move to another controller (referred to as a trespass by some array vendors). This fail-over between controllers can take some time to complete, depending on how busy the storage array is. In a misconfigured environment, the ownership of the LUN can continuously move between array controllers. This behavior is referred to as path thrashing, and can have serious performance implications for the ESXi host.
VMW_PSP_Fixed — Uses the designated preferred path, if it has been configured. Otherwise, it uses the first working path discovered at system boot time. If the ESXi host cannot use the preferred path (because of a path failure, for instance), this PSP selects a random alternative available path. The ESXi host automatically reverts back to the preferred path as soon as the path becomes available. Typically used with Active/Active arrays. A/A arrays are able to present the same LUN on multiple controllers at the same time.
VMW_PSP_RR – RR stands for Round Robin. It uses an automatic path selection rotating through all available paths and enabling load balancing across the paths. While this PSP can be used on both A/A arrays and A/P arrays, it is most typically found on A/A arrays since all paths to the LUN can be used in load balancing the I/O. On A/P arrays, only paths to the controller which is currently the LUN owner are used.What we haven’t discussed here is how PSP handles Asymmetric Logical Unit Access (ALUA) arrays. This will be covered in a future post.
As we have already seen, SATPs have a default PSP. However, it is supported to use other PSPs other than the default. A common scenario is for customers to move from Fixed to Round Robin. However, there has been a long-standing directive around Round Robin that you should discuss any changes to the PSP with your storage array vendor before implementing the change. For EMC customers, this is not necessary. Since 5.1, EMC have introduced Round Robin as the default path policy for their arrays. I posted about it here.
Now a number of alternate, partner specific PSPs also exist. For instance, DELL have had one for their EqualLogic arrays since vSphere 5.0. More recently, Nimble Storage introduced a PSP for their storage arrays.
That completes part 3 the deep-dive into the PSA. I hope that has given you some idea how the various components of the PSA are used, and why we chose to go with this direction for I/O device and path management.