Pluggable Storage Architecture (PSA) Deep-Dive – Part 4

In this post, I want to look at some fail-over and load balancing specific to ALUA (Asymmetric Logical Unit Access) arrays. In PSA part 3, we took a look at the different Path Selection Plugins (PSP), but for the most part these were discussed in the context of Active/Active arrays (where the LUN is available on all paths to the array) and Active/Passive arrays (where the LUN is owned by one controller on the array, and is only visible on the paths to that controller). ALUA provides a standard way for discovering and managing multiple paths to LUNs. Prior to ALUA, hosts need to use array vendor-specific ways to inquiry target port state. ALUA Provides a standard way to allow devices to report the states of its ports to hosts. The state of ports can be used by hosts to prioritize paths and make fail-over/load balancing decisions.

In ALUA arrays, both controllers can receive I/O commands, but only one controller can issue I/O to the LUN. The controller who can issue commands is called the managing controller. The paths to the LUN via ports on this controller are called optimized paths. I/O sent to a port of the non-owning controller must be transferred to the owning controller internally which increases latency and impacts on the performance of the array. Due to this, paths to the LUN via the non-managing controller are called non-optimized paths.

Target Port Group Support (TPGS)

TPGS provides a method for determining the access characteristics of a path to a LUN through a target port. It supports soliciting information about different target port capabilities. It also supports routing I/O to the particular port or ports that can achieve the best performance. Target Port Groups allows path grouping and dynamic load balancing. Each port in the same TPG has the same port state, which can be one of the following state: Active/Optimized, Active/Non-optimized, Standby, Unavailable, and In-Transition.

VMW_SATP_ALUA

In the context of the PSA, the NMP (Native Multipath Plugin) claims the path, then associate the appropriate Storage Array Type Plugin (SATP). For ALUA arrays, this is VMW_SATP_ALUA.  The VMW_SATP_ALUA plugin sends an Report Target Port Group (RTPG) command to the array to get the device’s TPG identifiers and states. There are 5 such target port groups per device, to reflect the 5 ALUA states that I mentioned earlier: Act/Opt, Act/Non-opt, Standby, Unavailable, and Transitioning. When a TPG’s state changes, the state of all paths in the same target port group will be changed.

Now, when we first introduced ALUA support back in vSphere 4.1, we introduced special SATPs & PSPs for ALUA. For example, the EMC Clariion array had an SATP called VMW_SATP_ALUA_CX which had a default PSP called VMW_PSP_FIXED_AP associated with it, as shown here:

esxcli nmp device list

The fail-over functionality for ALUA arrays has now been rolled up into the generic VMW_ALUA_SATP, and the special load-balancing capabilities for ALUA have been rolled up into the VMW_PSP_FIXED which we discussed previously in part 3.

Now if you list the paths to a device, you will see them organized by TPG state. In this example, there are only two paths to the device, one listed as active and the other as active unoptimized (which means it is a path to the LUN via the non-owning controller):

esxcli nmp path list

For further reading on the configuration parameters that you can have with ALUA arrays in the PSA, you can have a read of this blog article I wrote on the vSphere blog. This post also discusses the difference between Explicit & Implicit ALUA, the follow-over algorithm that we use, and path thrashing. One other ALUA feature that you might like to research is the ability to prioritize the I/O path when a fail-over occurs. There is another post on the vSphere blog which discusses this.

Comments are closed.