A brief overview of new Virtual SAN 6.0 features and functionality

The embargo on what’s new in vSphere 6.0 has now been lifted, so we can now start to discuss publicly about new features and functionality. For the last number of months, I’ve been heavily involved in preparing for the Virtual SAN launch. What follows is a brief description of what I find to be the most interesting and exciting of the upcoming features in Virtual SAN 6.0. Later on, I will be following up with more in-depth blog posts on the new features and functionality.

Scalability improvements – 64 nodes

Virtual SAN 6.0 now supports up to 64 nodes. In version 5.5, VSAN supports up to 32 nodes in a cluster. Virtual SAN is extremely simple to scale; it is simply adding a new ESXi host to a cluster via vCenter server.  Customers can start off with a very small VSAN cluster, but with version 6.0, customer can now grow their clusters massively. 64 node support is only for VSAN hybrid (flash for cache, magnetic disks for capacity).

Scalability improvements – 62TB VMDK

Virtual SAN 6.0 now supports virtual machine disk objects (VMDKs) that are 62TB in size, up from the 2TB – 512 bytes size that was supported in version 5.5. This allows customers to deploy very large virtual machines which should meet the capacity requirements of any application.

Scalability improvements – new on-disk format (v2)

There is a also a new on-disk format (v2) which massively scales the number of components per host from 3000 in version 5.5 to 9000 in version 6.0. New installs will automatically use this format. Upgrades from version 5.5 to 6.0 have a built-in utility which will upgrade the on-disk format from v1 to v2 using a rolling upgrade version. This new scalability features allows customers to deploy virtual machines with multiple disk objects (such as View desktops) and avoid reaching the component maximums. This new v2 on-disk format leverage VirstoFS, the filesystem acquired from the Virsto acquisition.

Performance improvements – all flash configuration

Virtual SAN 6.0 introduces support for a new all-flash configuration. In 5.5, Virtual SAN used a combination of flash (for cache) and magnetic disks for capacity. This hybrid configuration continues to be a supported configuration in version 6.0, but in 6.0 one can also choose to use flash devices for the capacity layer, resulting in an all-flash configuration. This will allow customers to deploy VMs on Virtual SAN, and meets the performance requirement of any application.

Performance improvement – snapshots and clones enhancements

Already mentioned in this list of new and enhancement features was the new on-disk format (v2). Not only does this new on-disk format allow for scalability, but this new on-disk format enhances the performance of snapshots and clones on the VSAN datastore. A new snapshot format (vsanSparse) has been introduced to enhance virtual machine snapshots, which previously used the vmfsSparse (redo log format) in version 5.5.

Availability improvements – fault domains

Fault domains in version 6.0 introduce an even higher level of availability in Virtual SAN. In version 5.5, customers could choose a NumberOfFailuresToTolerate for virtual machines to tolerate a certain number of host failures.  Now customer can control the placement of their replicas and witnesses by placing different components in different racks through the use of fault domains. With this new feature, customers can continue to run their virtual machines on Virtual SAN, even in the event of something catastrophic like a rack failure. The guideline in tolerating ‘n’ host failures was to have ‘2n + 1’ hosts in the cluster. Similarly, to tolerate ‘n’ domain failures, ‘2n + 1’ fault domains are required.

Availability improvements – vSphere HA maximums increases

Although this enhancement is not a Virtual SAN improvement per-se, VSAN certainly leverages it. vSphere HA in version 6.0 has been enhanced to protect more than the 2048 virtual machines that it could protect in version 5.5. This has a direct impact on Virtual SAN as vSphere HA can now protect up to 6,000 VMs running on a VSAN cluster, which should meet the needs of many customers.

Operational improvement – re-balance mechanism

From time to time, operations on Virtual SAN might involve the evacuation of certain disks, disk groups or hosts. This could be as a result of maintenance activities or failures in the cluster. At that time, component distribution will be unbalanced, with some disks, disk groups or hosts not containing any components, and other disks, disk groups or hosts containing more than their fair share of components. This new re-balance mechanism will allow administrators to balance components across all disks, disk groups and hosts. The net result is that this ability to re-balance the components will avoid performance issues with unbalanced configurations.

Operational improvement – default VM storage policy

Virtual SAN 6.0 provides administrators with the ability to create their own default VM Storage Policy policy. This means that VMs are deployed with an administrators own default policy rather than using the default policy that ships with VSAN, which may not be a desirable policy for the VM. This will also speed up deployment time, as administrators can avoid the step of selecting a particular policy for a VM and simply leverage the default one.

Operational improvement – disk evacuation granularity

 In version 5.5, to evacuate a single disk, the whole disk group had to be evacuated. This meant that a lot of additional space was needed in the cluster. It also mean that disk replacement activities took much longer than necessary. In 6.0, administrator now have the ability to evacuate a single disk rather than whole disk group. This will greatly speed up disk replacement scenarios.

Operational improvement – relaxing the requirement on witnesses

The VSAN team were conscious of how witness components were consuming the component count as the policies for virtual machines became more  demanding (failures-to-tolerate, stripe width). To address this, in Virtual SAN 6.0,  the quorum computation has been changed. The rule is no longer “more than 50% of components“. Instead, in 6.0, each component has a number of votes, which may be 1 or more. Now quorum is calculated based on the rule that “more than 50% of votes” is required. It then becomes a possibility that components are distributed in such a way that VSAN can still guarantee failures-to-tolerate without the use of witnesses.

 Operational improvement – lighting LED on disks via UI for ease of identity

As the scalability improvements mentioned here highlight, Virtual SAN clusters can now scale massively. When it comes to replacing a failed disk drive or SSD, it can be difficult to identify which disk is actually faulted, or in need of replacement. This new feature, which allows an administrator to light the LED on a drive, will speed up identifying a disk for replacement  and generally speed up disk replacement scenarios. This is available via the UI.

Operational improvement – mark disks as local or as flash via UI

This might seem like a small improvement but it is a significant time saving feature for customers who use SAS controllers. Certain SAS controllers can be shared between hosts, meaning that the disks show up as non-local to the ESXi host even though only one host might be using the disk or disks. In version 5.5, various claim rules had to be run on the ESXi hosts to make the disks local. In 6.0, this can now be achieved via the UI by simply marking the disk as local.

Another time-consuming operations activity is when a flash device is presented as a RAID-0 rather than a pass-thru. In many cases, with a RAIS-0 volume on the SSD, it is not seen as a flash device. Once again claim rules had to be run on the ESXi host to mark it as a flash device in version 5.5. Again, in 6.0, devices can be marked as flash via the UI. These are both excellent operational improvements for VSAN deployments.

Supportability improvement – VSAN deployed over L3 networks

In the networking space, version 6.0 will support being deployed on a layer 3 network. In version 5.5, this was not supported, and all nodes needed to be on the same layer 2 network. Customers can now deploy VSAN 6.0 using nodes that reside on different network segments.

Supportability improvement – external storage enclosure

Virtual SAN 6.0 introduces external storage enclosure support. This means that disk enclosures can now be attached to a host participating in a VSAN cluster, and those disks can in turn be consumed by VSAN for the VSAN datastore. This removes the requirement to have internal server disks, which basically opens the doors for blade server support, should customers wish to use them. This was an issue in 5.5, since blade servers typically had a small number of disk slots, meaning scalability was an issue.

This is an exceptional set of improvements. While 2014 saw over 1,000 paying VSAN customers, I suspect 2015 will be an even better year for VSAN considering the new enhancements.

16 comments
  1. Will this be a free upgrade for existing users? Will there be an additional charge to use the all flash feature?

  2. Nice the all flash configuration. But is it also possible for using both storage configs inside the same vsan cluster? Just like a silver storage policy (flash+magnetic) and a gold storage policy with only flash. So if you need some extra horse power you only need to change the storage policy?

  3. Will VSAN support differentiation between read intensive and write intensive SSDs in order to optimize and define where writes should go to and where reads should come from?

    • Yes – the caching layer will have high endurance, high performance SSDs. The capacity layer will use SSDs that are not as high spec’ed, and these will be marked as “capacity” layer devices so that VSAN knows which are which.

  4. Hi,

    The design guide states “the maximum number of disks per host is 35 (7 disks per disk group, 5 disk groups per host).”

    The newly released vSphere 6.0 configuration maximums guide describes VSAN maximums as:

    Virtual SAN Disk Groups per host = 5
    Magnetic disks per disk group 7
    SSD disks per disk group = 35
    Spinning Disks in all disk groups per host = 35

    This implies that the maximum disks per disk group is 8 (1 SSD, 7 Magnetic), and up to 40 disks per host.

    Which one is correct?

    Thanks,
    Nick

    • 7 “capacity tier” disks per disk group, 5 disk groups per host, giving 35 total “capacity tier” disks. Each diskgroup also contains a single “cache tier” flash device.

Comments are closed.