NFS Best Practices – Part 3: Interoperability Considerations

Welcome to part 3 of the NFS Best Practices series of posts. While part 1 looked at networking and part 2 looked at configuration options, this next post will look at interoperability with vSphere features. We are primarily interested in features which are in some way related to storage, and NFS storage in particular. While many of my regular readers will be well versed in most of these technologies, I’m hoping there will still be some items of interest. Most of the interoperability features are tried and tested with NFS, but I will try to highlight areas that might be cause for additional consideration.

Storage I/O Control

The whole point of Storage I/O Control (SIOC) is to prevent a single virtual machine (VM) residing on one ESXi host from consuming more than its fair share of bandwidth on a datastore that it shares with other VMs which reside on other ESXi hosts.

Historically, we have had a feature called ‘disk shares’ which can be setup on a per ESXi host basis. This will work quite well for all VMs residing on the same ESXi host sharing the same datastore (i.e. local disk). However this could not be used as a fairness mechanism for VMs from different ESXi hosts sharing the same datastore. This is what Storage I/O Control does for us. SIOC will modify the I/O queues on various ESXi hosts to ensure that VMs which have a higher priority get more queue entries that those VMs which have a lower priority, allowing these higher priority VMs to send more I/O than their lower priority counterparts.

SIOC is a congestion driven feature – when latency remains below a specific latency value, SIOC is dormant. It is only triggered when the latency value on the datastore rises above a pre-defined threshold.

SIOC was first introduced for block storage back in vSphere 4.1. It was introduced for NFS datastores back in vSphere 5.0. If you have a group of VMs sharing the same datastore spread across multiple ESXi hosts, and you want to avoid a single VMs I/O impacting the I/O (and thus performance) of other VMs, you should certainly consider using SIOC. With SIOC you can set shares to reflect the priority of VMs, but you can also implement an IOPS limit per VM. This means that you can limit the number of IOPS that a single VM can do to a shared datastore.

This is an enterprise + feature. More details on SIOC can be found in this whitepaper.

Network I/O Control

The Network I/O Control (NIOC) feature ensure that when the same NICs are used for multiple traffic types (e.g. 10Gb NICs), the NFS traffic not impacted by other traffic types on the same NICs. It works by setting priority and bandwidth using priority tags in TCP/IP packets. With 10Gb networks, this feature can be very useful as you will typically be sharing one pipe with multiple other traffic types. With 1Gb, the likelihood is that you have dedicated the pipe solely to NFS traffic. The point to note is that Network I/O Control is a congestion driven. If there is no congestion, any traffic type can consume as much bandwidth as it needs. NIOC only kicks in when there are different traffic types competing for bandwidth.

While SIOC assists with dealing with the noisy neighbour problem from a datastore sharing perspective, NIOC assists with dealing with the noisy neighbour problem from a network perspective.

Not only that, but one can also set the priority of different VM traffic. So if certain VM traffic is important to you, these VMs can be grouped into one virtual machine port group while lower priority VMs can be placed into another Virtual Machine port group. NIOC can now be used to prioritize VM traffic and ensure that the high priority VMs get more bandwidth when there is competition for bandwidth on the pipe.

SIOC and NIOC can co-exist and in fact complement one another.

This is an enterprise + feature. More details on NIOC can be found in this whitepaper.

Storage DRS

Storage DRS, introduced in vSphere 5.0, fully supports NFS datastores.

When you enable Storage DRS on a datastore cluster (group of datastores), balancing based on space usage is automatically configured. The threshold is set to 80% but can be modified if you so wish. What this means is that if space on a particular datastore is utilized 80% or more, Storage DRS will try to move VMs to other datastores using Storage vMotion to bring this usage value back down below 80%. The usage statistics of the datastores are checked on an ongoing basis.

If the cluster is set to automatic mode of operation, Storage DRS will use Storage vMotion to automatically migrate VMs to other datastores in the datastore cluster if the threshold is exceed. If the cluster is set to manual, the administrator will be given a set of recommendations to apply. Storage DRS will provide the best recommendations to balance the space usage of a datastores. As before, once you apply the recommendations, Storage vMotion will be used to move one or more VMs between datastores in the same datastore cluster.

Another feature of Storage DRS is its ability to balance VMs across datastores in the datastore cluster based on I/O metrics, specifically based on latency.

Storage DRS uses Storage I/O Control (SIOC) to evaluate datastore capabilities & capture latency information regarding all the datastores in the datastore cluster. As mentioned earlier. SIOC’s purpose is to ensure that no single VM uses all the bandwidth of a particular datastore, and it modifies the queue depth to the datastores on each ESXi host to achieve this.

In Storage DRS, its implementation is different. SIOC (on behalf of Storage DRS) checks the capabilities of the datastores in a datastore cluster by injecting various I/O loads. Once this information is normalized, Storage DRS will have a good indication of the types of workloads that a datastore can handle. This information is used in initial placement and load balancing decisions.

Storage DRS continuously uses SIOC to monitor how long it takes an I/O to do a round trip – this is the latency. This information about the datastore is passed back to Storage DRS. If the latency value for a particular datastore is above the threshold value (default 15ms) for a significant percentage of time over an observation period (default 16 hours), Storage DRS will try to rebalance the VMs across the datastores in the datastore cluster so that the latency value returns below the threshold. This may involve a single or multiple Storage vMotion operations. In fact, even if Storage DRS is unable to bring the latency below the defined threshold value, it may still move VMs between datastores to balance the latency.

When starting out with evaluation Storage DRS, VMware makes the same recommendation that we made for DRS initially. The recommendation is to run Storage DRS in manual mode first, monitoring the recommendations that Storage DRS is surfacing and making sure that they make sense. After a period of time, if the recommendations make sense, and you build a comfort level with Storage DRS, consider switching it to automated mode.

There are a number of considerations when using Storage DRS with certain array features. VMware has already produced a very detailed white paper regarding the use of Storage DRS with array features like Tiered storage, Thin provisioning, deduplication, etc. More details around Storage DRS interoperability with storage array features can be found in this whitepaper.

VAAI

Many NAS storage arrays now support a number of vSphere API for Array Integration (VAAI) primitives. The purpose of this API is to allow the ESXi host to offload certain storage operations to the storage array rather than consuming resources on the ESXi host to do the same operation.

The first primitive we will discuss is Full File Clone, which allows you to offload a cold clone operation or template deployments to the storage array. One important point to note is that this primitive does not support Storage vMotion – the primitive can only be deployed when the VM is powered off. Storage vMotion on NFS datastores continue to use the VMkernel software data mover.

The next primitive is called Fast File Clone. This is where the creation of linked clones is offloaded to the array.  With the release of VMware View 5.1, this feature was supported as a tech preview. A future release of view (at the time of writing) is needed for full support of this primitive. With the release of vSphere 5.1 and with vCloud director 5.1, this primitive is fully supported for vCloud vApps when VAAI is enabled on the datastore and Fast Provisioning using linked clones is selected.

Reserve Space is another VAAI NAS primitive. Without VAAI NAS, we never had the ability to pre-allocate or zero out space for VMDKs on NFS. Historically the only option available was to build thin VMDKs on NFS. With the introduction of Reserve Space, one can now create thick VMDKs on NFS datastores. However VAAI NAS Reserve Space is not like Write Same for  block; it does not get the array to do the zeroing on its behalf. When creating a VMDK on a VAAI NAS array, selecting Flat sends a Space Reserve NAS VAAI command to the array which guarantees that the space will be available. This is equivalent to VMFS lazyzeroedthick, and the blocks are zeroed on first write. However selecting Flat pre-initialized also does  sends a Space Reserve NAS VAAI command, plus it does ESX-based zero writing to the VMDK – equivalent to a VMFS eagerzeroedthick. This means that it is a slow operation, and any writes are sent over the wire – they are not offloaded. So for zeroing operations, it is safe to say that block arrays have an advantage.

As an aside, we just said that VAAI NAS Reserve Space allows you to create virtual disks in Thick Provision Lazy Zeroed (lazyzeroedthick) or Thick Provision Eager Zeroed (eagerzeroedthick) format on NFS datastores on arrays which support Reserve Space. However, when you check the disk type on the Virtual Machine Properties dialog box, the Disk Provisioning section always shows Thick Provision Eager Zeroed as the disk format no matter which format you selected during the disk creation. ESXi does not distinguish between lazy zeroed and eager zeroed virtual disks on NFS datastores.

The final feature is Extended Stats (NAS). This allows us to query how much space a VMDK actually consumed on an NFS datastore. For example, you might have created a 100GB thin VMDK, but actually consume only 25GB of space on the array. This was something vSphere previously never had any insight into. This was not a necessary feature to have for VMFS, since vSphere understands VMFS very well. But we did need something like this for NFS.

Remember that a VAAI NAS plugin is required from your respective storage array vendor for any of these primitives to work. The plugin must be installed on each ESXi host that wishes to leverage the VAAI NAS primitives.

What about the Thin Provisioning (TP) primitives? We did some work around these primitives in vSphere 5.0, such as raising an alarm when a TP volume reached 75% capacity at the backend, TP-Stun and of course the UNMAP primitive. However these TP primitives are for SCSI only. The VAAI space threshold alarm is only supported on SCSI datastores. Similarly, the VAAI TP-Stun was introduced to detect “Out of space” conditions on SCSI LUNs. However, for NAS datastores, NFS servers can already return an out-of-space error which should be propagated up the stack. This should induce a VM stun similar to how it happens with VAAI TP.  This behaviour does not need the VAAI-NAS plugin, and should work on all NFS datastores, be the host have VAAI enabled or not. Finally, the UNMAP primitive is also for SCSI – the ‘dead space reclaiming’ is not an issue on NAS arrays. A detailed whitepaper on VAAI can be found here.

Site Recovery Manager/vSphere Replication

Site Recovery Manager (SRM) fully supports array based replication on NFS datastores. vSphere Replication fully supports replicating Virtual Machines which reside on NFS datastores.

With regards to best practice, I was reliably informed that one should consider storing the swap in a different directory when using a replicated NFS datastore in SRM.  This will allow us to reduce the amount of replicated content which gets recreated on a failover. It also saves on us having to delete and recreate the .vswp files on the destination datastore after a failover. This has caused unnecessary delays during failover in earlier versions of SRM as file handles on NFS needed to expire on the .vswp files before they can be deleted and recreated. Some of this has been improved in version 5 of SRM.

Another consideration is the use of Fully Qualified Domain Names (FQDN) rather than IP addresses when mounting NFS datastores. Some storage array vendors require you to use IP addresses when using their Storage Replication Adapter (SRA) with SRM. Please reach out to your storage array vendor for guidance on whether or not this is a requirement.

Storage vMotion

Storage vMotion has gone through quite a few architectural changes over the years. The latest version in vSphere 5.x uses a mirror driver to splits writes to the source and destination datastore once a migration is initiated. This should means speedier migrations since there is only a single copy operation now needed, unlike the recursive copy process used in previous versions which leverage Change Block Tracking (CBT).

The one consideration, and this has been called out already, is that Storage vMotion operations cannot be offloaded to the array with VAAI. All Storage vMotions on NFS datamovers will be done by the software datamover.

The only other considerations with Storage vMotion are relevant to both block & NAS, namely the configuration maximums. At the time of writing, the maximum number of concurrent Storage vMotion operations per ESXi host was 8, and the maximum number of Storage vMotion operations per NFS datastore was 2. This is to prevent any single datastore from being unnecessarily impacted by Storage vMotion operations. Note that a new enhancement in vSphere 5.1 allows up to 4 VMDKs belonging to the same VM to be migrated in parallel, so long as the VMDKs reside on different datastores. More detail here.

As I mentioned in one of my earlier articles, the goal here is to update the current NAS Best Practices whitepaper which is now a little dated. If you think other interoperability concerns or considerations which should be called out in the paper, please let me know.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan.