There is a project currently underway here at VMware to update the current Best Practices for running VMware vSphere on Network Attached Storage. The current paper is a number of years old now, and we are looking to bring it up to date. There are a number of different sections that need to be covered, but we decided to start with networking, as getting your networking infrastructure correct will play a crucial part in your NAS performance and availability obviously.
We are also looking for feedback on what you perceive as a best practice. The thing about best practices is that something which might be correct for one customer may not be the correct thing for another customer. I hope you will continue reading to the end, and provide some feedback on how you implement your NFS network.
Since VMware still only supports NFS version 3 over TCP/IP, there are still some limits to the multipathing and load-balancing approaches that we can make. So although there are two connections when an NFS datastore is mounted two an ESXi host (one connection for control, the other connection for data), there is still only a single TCP session for I/O. From the current VMware Best Practices for NFS:
It is also important to understand that there is only one active pipe for the connection between the ESX server and a single storage target (LUN or mountpoint). This means that although there may be alternate connections available for failover, the bandwidth for a single datastore and the underlying storage is limited to what a single connection can provide. To leverage more available bandwidth, an ESX server has multiple connections from server to storage targets. One would need to configure multiple datastores with each datastore using separate connections between the server and the storage. This is where one often runs into the distinction between load balancing and load sharing. The configuration of traffic spread across two or more datastores configured on separate connections between the ESX server and the storage array is load sharing.
Let’s begin the conversation by looking at some options available to you and how you might be able to improve performance, keeping in mind that you have a single connection between host and storage.
1. 10GigE. A fairly obvious one to begin with. If you can provide a larger pipe, the likelihood is that you will achieve greater throughput. Of course, if you’re not driving enough I/O to fill a 1GigE pipe, then a fatter pipe isn’t going to help you. But let’s assume that you have enough VMs and enough datastores for 10GigE to be beneficial.
2. Jumbo Frames. While this feature can deliver additional throughput by increasing the size of the payload in each frame from a default MTU of 1500 to an MTU of 9,000, great care and consideration must be used if you decide to implement it. All devices sitting in the I/O path must be able to implement jumbo frames for it to make sense (array controller, physical switches, NICs and VMkernel ports).
3. Load Sharing. One of the configuration options mentioned above and taken from the current white paper is to use multiple connections from the ESXi server to the storage targets. To implement this, one would need to configure multiple datastores, with each datastore using separate connections between the server and the storage, i.e. NFS shares presented on different IP addresses, as shown in the following diagram:
4. Link Aggregation. Another possible way to increase throughput is via the use of link aggregation. This isn’t always guaranteed to deliver additional performance, but I will discuss Link Aggregation in the context of availability later on in the post.
Since NFS on VMware uses TCP/IP to transfer I/O, latency can be a concern. To minimize latency, one should always try to minimize the number of hops between the storage and the ESXi host.
Ideally, one would not route between the ESXi host and the storage array either, and have them both on the same subnet. In fact, prior to ESXi 5.0U1, one could not route between the ESXi host and the storage array, but we lifted some of the restrictions around this in 5.0U1 as per this blog post. It is still pretty restrictive however.
All NAS array vendors agree that it is good practice to isolate NFS traffic for security reasons. By default, NFS traffic is sent in clear text over the traffic. Therefore, it is considered best practice to use NFS storage on trusted networks only. This would mean isolating the NFS traffic on its own separate physical switches or leveraging a dedicated VLAN (IEEE 802.1Q).
Another security concern is that the ESXi host mounts the NFS datastores using root privileges. Since this is NFS version 3, none of the security features implemented in later versions of NFS are available. To address the concern, again it is considered a best practice to use either a dedicated LAN or a VLAN for protection and isolation.
There are a number of options which can be utilized to make your NFS datastores highly available.
1) NIC teaming at the host level. IP hash failover enabled at the ESXi host. A common practice is to set the NIC Teaming failback Option to no. The reason for this is to avoid a flapping NIC if there is some intermittent issue on the network.
The above design is somewhat simplified. There are still issues with the physical LAN switch being a single point of failure (SPOF). To avoid this, a common design is to use NIC teaming in a configuration which has two physical switches. With this configuration, there are four NIC cards in the ESXi host, and these are configured in two pairs of two NICs with IP hash failover. Each pair is configured as a team at their respective LAN switch.
2) Link Aggregation Control Protocol (LACP) at the array level is another option one could consider. Link Aggregation enables you to combine multiple physical interfaces into a single logical interface. Now it is debatable whether this can improve throughput/performance since we are still limited to a single connection with NFS version 3, but what it does allow is protection against path failures. Many NFS array vendors support this feature at the storage controller port level. Most storage vendors will support some form of link aggregation, although not all configurations may conform to the generally accepted IEEE 802.3ad standard. Best to check with your storage vendor. One of the features of LACP is its ability to respond to events on the network and decide which ports should be part of the logical interface. Many failover algorithms only respond to link down events. This means that a switch could inform the array that an alternate path needs to be chosen, rather than the array relying on a port failure.
3) LACP at the ESXi host level. This is a new feature which VMware introduced in vSphere 5.1. While I haven’t tried this myself, I can see this feature providing some additional availability in the same way as it may provide additional availability for arrays, so that a failover to an alternate NIC can now occur based on feedback from the physical switch as opposed to just relying on a link failure event (if this is indeed included in the VMware implementation of LACP). You can learn a little bit more about host side LACP support in a whitepaper written by my colleague Venky. If you are using this new feature, and it has helped with availability improvements, I’d really like to know.
Miscellaneous Network Features
By way of completeness, I wanted to highlight a few other recommendations from our storage partners. The first of these is flow control. Flow control manages the rate of data flow between the ESXi host and storage array. Depending on the interconnect (1GigE or 10GigE), some array vendors make recommendations about turning flow control off and allowing congestion to be managed higher up the stack. One should always refer to storage array vendors best practices for guidelines.
The second is a recommendation around switch ports when Spanning Tree Protocol (STP) is used in an environment. STP is responsible for ensuring that there are no network loops in a bridged network by disabling network links and ensuring that there is only a single active path between any two network nodes. If there are loops, this can have severe performance impacts on your network with unnecessary forwarding of packets taking place, eventually leading to a saturated network. Some storage array vendors recommend setting the switch ports to which their array ports connect as either RSTP edge ports or Cisco portfast. This means that ports immediately transition its forwarding state to active. Refer to your storage array best practices for advice on this setting, and if it is appropriate for their storage array.
Useful VMware KBs for NFS networking
Useful NFS Best Practice References
- NetApp’s NFS Best Practices TR-3749
- EMC Isilon’s vSphere 5 Reference Architecture
- HDS vSphere 5 Reference Architecture
- Chad Sakac (virtual geek) multi-vendor NFS blog post – a recommended read
So, the question is, do you do anything different to the above? This is just from a network configuration perspective. Later, I will do some posts on tuning advanced settings, and integration with vSphere features like Storage I/O Control, Storage DRS, Network I/O Control and VAAI. But for now I’m just interested in networking best practices. Please leave a comment.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage