What’s new in vSAN 6.6?

vSAN 6.6 is finally here. This sixth iteration of vSAN is the quite a significant release for many reasons, as you will read about shortly. In my opinion, this may be the vSAN release with the most amount of new features. Let’s cut straight to the chase and highlight all the features of this next version of vSAN. There is a lot to tell you about. Now might be a good time to grab yourself a cup of coffee.

Encryption

vSAN 6.6 offers DARE – Data At Rest Encryption. Yes, vSphere 6.5 also offer per VM encryption through the use of policies, but that was done at the VM layer, and if deduplication was enabled at the vSAN layer, you didn’t get the space-saving benefits from it. Encryption in vSAN 6.6 takes places at the lowest level, meaning that you can also get the benefits of dedupe and compression. vSAN encryption is enabled at the cluster level, but It is implemented at the physical disk layer, so that each disk has its own key provided by a supported Key Management Server (KMS).

This feature relies heavily on AESNI – Advanced Encryption Standard Native Instruction. This is available on all modern CPUs. There are new health checks which ensure that the KMS is still accessible, and that all the hosts in the vSAN cluster support AESNI.

A word of caution! Make sure you have your KMS protected. If you lose your keys , you lose your data. So if you are going to implement vSAN Encryption, ensure you have a good backup/restore mechanism in the event of your KMS going pop.

Local Protection in vSAN Stretched Clusters

I know this is a feature that a lot of vSAN Stretched Cluster customers have been looking for. In fact, I know of potential customers who have put off implementing vSAN stretched cluster because we did not have this feature. I’m delighted that this is now available in vSAN 6.6. In a nutshell, through policies, customers can now specify a protection level on a per site basis, as well as across sites.

There are now two protection policies; Primary level of failures to tolerate (PFTT) and Secondary level of failures to tolerate (SFTT). For stretched cluster, PFTT defines cross site protection, implemented as RAID-1. For stretched cluster, SFTT defines local site protection. SFTT can be implemented as RAID-1, RAID-5 and RAID-6. This means that even if there is a full site failure, a VM can still be protected against host or disk failure in the remaining site.

One thing to note: the witness for the cross-site protection must remain accessible, i.e. there must still be one data site and the witness site available. SFTT will not protect against the loss of a data site AND the loss of a witness.

One question you might ask is whether local site protection increases the amount of traffic that needs to be shipped over the inter-site link. The answer is no. We have implement a “Proxy Owner” feature now for each site. This means that instead of writing to all replicas in the remote site, we now do a single write to the Proxy Owner on the remote site, and this is then responsible for writing to all replicas on the remote site. Thus there is still only a single cross site write for multiple replicas.

Please note that this is not nested fault domains. In other words, you do not have control over where to place the components of an object on the data sites. All we guarantee is that we can protect the VM locally against a host or disk failure as before. We cannot do rack awareness at each site with this feature.

Secondary level of Failure To Tolerate only appears as a policy option when vSAN stretched cluster is enabled/configured.

Site Affinity in vSAN Stretched Clusters

Some customers have expressed an interest in being able to deploy VMs on a vSAN stretched cluster with FTT=0, in other words, do not tolerate any failures. This has typically been for applications that have a backup running elsewhere, or has the ability to replicate internally at the application level, e.g. SQL Server AlwaysOn. Customers can now use the Affinity policy to request that a particular VM get assigned to a particular site, from a storage perspective. This is akin to specifying data locality for a particular VM. This policy is only applicable when the primary level of failures to tolerate is set to 0.

Customers need to ensure that DRS/HA rules should align to Data Locality. Customers should pin the VM’s compute to the site where the VMDK resides via affinity groups. There is no automatic way to determine this at present. Affinity only appears as a policy option when vSAN stretched cluster is enabled/configured.

Unicast Mode

Yes – we have finally removed our reliance on multicast. No more IGMP snooping, or PIM for routing multicast traffic. In vSAN 6.6, all hosts will now talk unicast, and the vCenter server becomes the source of truth for cluster membership. If you are upgrading from a previous version of vSAN, vSAN will automatically switch to unicast once all hosts have been upgraded to vSAN 6.6.

Of course, with all these things there are caveats. For example, if the on-disk format has not been upgraded to the latest version 5, and a pre-vSAN 6.6 host is added to the cluster, then the cluster reverts to multicast.

On the other hand, if you have upgraded to on-disk format v5, and then add a pre-vSAN 6.6 host to the cluster, the cluster will continue to talk unicast, but this newly added host can only talk multicast so it will be partitioned. If the cluster is at 6.6, and the on-disk format is v5, don’t add any pre-vSAN 6.6 hosts to the cluster.

This removal of multicast as a requirement will definitely make vSAN deployments much easier from a networking requirements perspective.

By the way, if you run an esxcli vsan network list, multicast information will still be displayed even though it may not be used. The following new esxcli command will tell which hosts are using unicast (it does not list the host where the command is being run from however):

[root@esxi-dell-i:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name
------------------------------------  ---------  ----------------  -------------  -----  ----------
58d8ef12-bda6-e864-9400-246e962c23f0          0              true  172.200.0.123  12321
58d8ef50-0927-a55a-3678-246e962f48f8          0              true  172.200.0.122  12321
58d8ef61-a37d-4590-db1d-246e962f4978          0              true  172.200.0.124  12321

Here is the same command run from a vSAN stretched cluster (note the witness):

[root@esxi-dell-j:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address     Port  Iface Name
------------------------------------  ---------  ----------------  ------------  -----  ----------
58d29c9e-e01d-eeea-ac6b-246e962f4ab0          0              true  172.4.0.121   12321
58d8ef12-bda6-e864-9400-246e962c23f0          0              true  172.3.0.123   12321
58d8ef61-a37d-4590-db1d-246e962f4978          0              true  172.3.0.124   12321
00000000-0000-0000-0000-000000000000          1              true  147.80.0.222  12321
[root@esxi-dell-j:~]

More visibility into objects via UI

This is something close to my heart. In previous releases of vSAN, we’ve only be able to see the VM Home and VMDK objects represented in the UI. Items like the VM Swap object could only be observed from the RVC. In this vSAN release, we have now the ability to query the status of other objects such as the VM Swap, as shown below.

Also note here that the VMDK that have the VM name associated with them (e.g. Hard disk 9 -vcsa-06.rainpole.com_8.vmdk) are snapshot delta objects. These are now also view-able now from within the UI.

Smart Rebuilds

In previous releases, vSAN never re-used the original component after a rebuild operation to a new component had started. So if a host is absent for more than 60 minutes, the components on that host are never re-used even if the host comes back after 60 minutes.

vSAN introduces a new smart rebuild behavior. If the absent components comes back online, even after 60 minutes, vSAN compares the cost of re-using the old components versus the cost of continuing to resync the new components. vSAN then chooses the one with the lower cost and cancels the resync for the other one. This feature saves a lot of unnecessary resyncing and temporary space usage when a host goes absent and comes back after 60 minutes.

There is another enhancement to the rebuild mechanism in vSAN 6.6. The rebalancing protocol has been updated to accommodate previous vSAN limitations. One issue was an inability to break large components into smaller chunks.
 This meant that rebalancing of large components sometimes led to an inability to rebalance due to space constraints (i.e. no place to rebuild a large component). In vSAN 6.6, rebalancing can now break large components into smaller chunks when physical disk capacity is at greater than 80% utilization.

I did a test of this feature in my lab. In one example, I noticed a large 232.1 GB component split into a concatenation of 2 x 116.1 GB components:

Original:

Component: bbcbe458-094f-18d6-5b90-246e962f4978 (state: ACTIVE (5), host: esxi-dell-l.rainpole.com, 
md: naa.500a07510f86d69f, ssd: naa.55cd2e404c31f9a9,votes: 1, usage: 232.1 GB, proxy component: false)

Example 1:

 Concatenation
Component: 79ece458-dfff-d828-5d74-246e962f4ab0 (state: ACTIVE (5), host: esxi-dell-j.rainpole.com, 
md: naa.500a07510f86d6c7, ssd: naa.55cd2e404c31f8f0,votes: 1, usage: 116.1 GB, proxy component: false)
Component: 79ece458-c790-db28-fead-246e962f4ab0 (state: ACTIVE (5), host: esxi-dell-j.rainpole.com, 
md: naa.500a07510f86d684, ssd: naa.55cd2e404c31f8f0, votes: 1, usage: 116.1 GB, proxy component: false)

On another occasion, as disk became more and more scarce, the component was split into a concatenation of 8 x 29 GB chunks.

Example 2:

 Concatenation 
Component: a10be558-6fae-0105-403c-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-i.rainpole.com, 
md: naa.500a07510f86d6c5, ssd: naa.55cd2e404c31ef8d, votes: 1, usage: 29.0 GB, proxy component: false) 
Component: a10be558-e141-0405-17b2-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-i.rainpole.com, 
md: naa.500a07510f86d6c5, ssd: naa.55cd2e404c31ef8d, votes: 1, usage: 29.0 GB, proxy component: false) 
Component: a10be558-8564-0505-f136-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-k.rainpole.com, 
md: naa.500a07510f86d69c, ssd: naa.55cd2e404c31e2c7, votes: 1, usage: 29.0 GB, proxy component: false) 
Component: a10be558-67c1-0605-8f2c-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-k.rainpole.com, 
md: naa.500a07510f86d6b8, ssd: naa.55cd2e404c31e2c7, votes: 1, usage: 29.0 GB, proxy component: false) 
.
.
.

Partial Repairs

If there are objects with components that are degraded or absent for more than 60 minutes, vSAN will try to repair all the components of that object to make the object completely compliant once more. In previous releases, if there are not enough available resources to repair all the impacted components of an object, vSAN simply did not try a repair attempt.

In vSAN 6.6, vSAN now tries to repair as many impacted components as possible, even if it can’t repair all the components of an object. This is important in scenarios where partially repairing the components in an object can allow vSAN to tolerate additional failures to the object, even though there are not enough resources in the cluster to make the object fully compliant.

Here is a scenario from a stretched cluster using local site protection to explain this concept a little better. If one site is down, then all VMs are running on the remaining site. If there is another failure in the remaining site, vSAN will now repair as many components as possible but may not be able to repair all the components in an object.

  • 12 node cluster (6+6+1)
  • 6 nodes remaining after a site failure
  • Secondary FTT=2
  • Secondary FTM=RAID-1
  • Requires 5 hosts – 3 copies of the data, 2 witnesses (2n + 1)

 

If the remaining site now suffers 2 additional host failure, vSAN now rebuilds data copies on the 4 remaining hosts. However, vSAN can only repair then up to (FailuresToTolerate) FTT=1 due to a lack of remaining hosts. To implement FTT=2, we need 2n+1 hosts (5) but there are only 4 remaining on this site. Therefore FTT=1 is the most that vSAN can repair.

Partial repair is only done if it results in a higher FTT. Data components are always repaired before witness components (we prioritize data components.

Resync Throttling

Another major enhancement around resync is to give the end-user control over resync activity from both the UI and the CLI. This has been something many customers have been asking for. While under normal circumstances we would strongly recommend that the resync throttling should be left disabled, there may be occasions where you wish to throttle it up or down. The screenshot below shows where this can be done.

Finally, in the past, if a resync process was interrupted, the resync may need to start all over again. Now in vSAN 6.6, resync activity will resume from where it left off (if interrupted) by using a new resync bitmap to track changes. A very desirable improvement indeed.

New pre-checks for maintenance mode

vSAN now does a capacity pre-check before allowing a host enter maintenance mode. This will prevent scenarios where putting a host into maintenance mode may put capacity constraints on a cluster.

As you can see, this details how much space is needed to do a full data evacuation, or an ensure data accessibility option. If you choose to do not data evacuation, it will tell you how many objects are impacted.

New pre-checks for disk group, disk removal

Similar to maintenance mode, there are now pre-checks if you wish to decommission a disk group or a disk. By default, it selects to evacuate ALL data, but by changing the option to ensure data accessibility or no data evacuation, you are presented with an impact statement.

This can be a very useful visual aid when replacing or upgrading devices. A similar view is available for disks.

Easy Install and Config Assist

There is two parts to this feature. The first is the ability to deploy vSAN more easily. The net result is that the installer will guide you through a successful deployment of a vCenter Server to a single ESXi host that is configured with a vSAN datatore. This will take care of the networking setup, the claiming of local storage devices, the creation of the vSAN datastore and so on. Of course, you will need to grow your cluster to the minimum of 3 nodes afterwards, but this is just a matter if dropping the hosts into the cluster. And just in case you needed guidance after the cluster is set up, there is another new tool called “Config Asst” to help you verity that your vSAN cluster is configured correctly. It will highlight items such as DRS not configured, or vSphere HA not configured, along with a bunch of other items such as network configuration and hardware compatibility. Below is an example of some of the tests. This needs a lot more detail around these features, so I will follow-up with another blog post in due course.

Online/Cloud health

This is something I am very passionate about; proactive notifications about potential issues as well as the ability to provide prescriptive guidance on what to do when something does got wrong. In vSAN 6.6, we are taking our first steps towards using analytics to provide you with this information in real-time. Once you enable CEIP (Customer Experience Improvement Program), information about your vSAN system is sent back to us here at VMware. We can then use this information to provide additional online health checks which are specific to your environment. There is a lot more to say about this, so I will follow-up with a more thorough blog post at a future date.

HTML5 Host Client Integration

For those of you who have started to use the new HTML5 host client, there are now a number of vSAN workflows included in the new client. One of the nicer enhancements is the health feature, which allows you to get a look at the overall health of the vSAN cluster from a single host client:

There are also options for enabling data services, such as deduplication, etc, via the new HTML5 host client.

New esxcli commands

I already showed you the new unicastagent esxcli command earlier. However, a new esxcli command to assist with troubleshooting has also been added.

esxcli vsan debug
 Usage: esxcli vsan debug {cmd} [cmd options]

Available Namespaces:
 disk Debug commands for vSAN physical disks
 object Debug commands for vSAN objects
 resync Debug commands for vSAN resyncing objects
 controller Debug commands for vSAN disk controllers
 limit Debug commands for vSAN limits
 vmdk Debug commands for vSAN VMDKs

As well as the esxcli vsan debug command, we also added the following commands in vSAN 6.6 information to get troubleshooting information:

• esxcli vsan health cluster
 • esxcli vsan resync bandwidth
 • esxcli vsan resync throttle
Example 1:
 Use "vsan debug vmdk" command to check all of VMDKs status:
 
 [root@esxi-dell-j:~] esxcli vsan debug disk list
 UUID: 52bc5813-b8a5-b004-60cd-82cf6cda6426
    Name: naa.500a07510f86d6c7
    SSD: True
    Overall Health: green
    Congestion Health:
          State: green
          Congestion Value: 0
          Congestion Area: none
    In Cmmds: true
    In Vsi: true
    Metadata Health: green
    Operational Health: green
    Space Health:
          State: green
          Capacity: 800155762688 bytes
          Used: 20883439616 bytes
          Reserved: 3187671040 bytes

Example 2:

Use “vsan debug object” command to get the unhealthy vSAN components and the owner host information like below

[root@esxi-dell-j:~] esxcli vsan debug object list
 Object UUID: be71e758-6b8e-d700-23a0-246e962f48f8
    Version: 5
    Health: healthy
    Owner: esxi-dell-l
    Policy:
       stripeWidth: 1
       cacheReservation: 0
       SCSN: 19
       CSN: 16
       spbmProfileGenerationNumber: 0
       proportionalCapacity: [0, 100]
       spbmProfileId: aa6d5a82-1c88-45da-85d3-3d74b91a5bad
       hostFailuresToTolerate: 1
       spbmProfileName: Virtual SAN Default Storage Policy
       forceProvisioning: 0

   Configuration:

      RAID_1
          Component: be71e758-260c-3201-35e6-246e962f48f8
            Component State: ACTIVE,  Address Space(B): 273804165120 (255.00GB),  
            Disk UUID: 52685839-7a66-65de-f99a-236e79010972,  Disk Name: naa.500a07510f86d6ae:2
            Votes: 1,  Capacity Used(B): 436207616 (0.41GB),  Physical Capacity Used(B): 427819008 (0.40GB),  
            Host Name: esxi-dell-j
          Component: be71e758-d9b0-3301-14b7-246e962f48f8
            Component State: ACTIVE,  Address Space(B): 273804165120 (255.00GB),  
            Disk UUID: 52de35db-5199-b5aa-9b47-ff1b44e385f2,  Disk Name: naa.500a07510f86d6ca:2
            Votes: 1,  Capacity Used(B): 440401920 (0.41GB),  Physical Capacity Used(B): 432013312 (0.40GB),  
            Host Name: esxi-dell-k
       Witness: be71e758-a017-3501-b7a3-246e962f48f8
         Component State: ACTIVE,  Address Space(B): 0 (0.00GB),  
         Disk UUID: 523706de-4339-921a-450d-da93907f7546,  Disk Name: mpx.vmhba1:C0:T1:L0:2
         Votes: 1,  Capacity Used(B): 12582912 (0.01GB),  Physical Capacity Used(B): 4194304 (0.00GB),  
         Host Name: witness-02.rainpole.com

   Type: vmnamespace
    Path: /vmfs/volumes/vsan:522588116e7b43f6-7171459d5cd6a303/
    Group UUID: 00000000-0000-0000-0000-000000000000
    Directory Name: win-2012-2

So there you have it – vSAN 6.6 in all its glory. While I have only barely skimmed the surface on some of these improvements, I’m sure you will agree it is a pretty impressive release. (For those interested, Duncan also has a great “what’s new” on his blog.)

33 comments
  1. Hi Cormac,

    You say in your post “vSAN introduces a new smart rebuild behavior. If the absent components comes back online, even after 60 minutes, vSAN compares the cost of re-using the old components versus the cost of continuing to resync the new components.”

    But in the “Administering Guide vSan 6.5” it is written that: “If the host rejoins the cluster after 60 minutes and recovery has started, Virtual SAN evaluates whether to continue the recovery or stop it and resynchronize the original components.”

    This feature already existed in 6.5 ?

    Thanks 😉

    • Good spot! 🙂

      When we wrote the book, we were told that this was how it behaved. But it later transpired that it did not. It seems previously we would keep resyncing both copies – the newly created one, and the original one that returned. We then discarded the original one. The behaviour we described is now what we have in 6.6.

  2. Good day sir!

    Two questions:
    1) Will the Primary and Secondary FTT policy changes be available for two node ROBO sites as well?
    2) You mention disk format version 5. Will the upgrade process be similar to the v2 to v3 upgrade process?

    • Hi Matt,

      On 1, no – this is not possible. There are not enough resources (hosts) to implement a secondary policy.
      On 2, it depends which version you are coming from. Updating disk groups from Version 3 to Version 5 is only a metadata update.
      No data movement or evacuation of existing data is required when upgrading from Version 3 to Version 5. If upgrading from an earlier version, then yes, the upgrade process will be similar to those seen with previous versions.

      • Thank you sir!

        We would be moving from 3 to 5, so this is fantastic news. The v2 to v3 upgrade was not fun for ROBO sites.

        For 1, I ask because we overbuilt our nodes and they have a substantial wealth of capacity tier storage compared to our workloads. Being able to apply a policy to objects that would place additional copies on different disks of the same host would be another layer of redundancy that we could easily take advantage of.

  3. Hi Cormac

    Sorry this question is not directly related to this article but I was hoping you could tell me where I can find the list of supported operating systems and their versions to connect to the vSAN iSCSI service? I had no luck finding that information.

    Thanks!

  4. Thanks Cormac! Is there actually also an overview of what is supported using iSCSI, I am thinking SCSI reservations, UNMAP, online LUN expansion an such features?

    • There is a new networking guide for vSAN in the works, and this should be available at GA. That has a significant section on iSCSI, which should help.

  5. re: dedup – I pointed out in your blog about VSAN 6.2 that a byte check should be done on a SHA1 dedup hit to insure there wasn’t a collision and resulting data loss. In the time since, Google has published an actual SHA1 collision pair. Does 6.6 do a byte check or is the data validity still left to chance?

    • In vSAN 6.6, our checksum will detect an SHA1 collision so will not return incorrect data.

      We do have some longer term plans to address this fully, and we are aware of it as a concern.
      This could possibly include a move to sha256. But nothing confirmed as yet.

  6. Hi Cormac

    Can you tell me if and when online iSCSI LUN expansions will be supported by vSAN (at least a rough estimation)?

    Thanks for your help!

  7. Hi Cormac,
    Are there any details on what happens if the KMS is not available? Does the KMS need to be available 100% of the time, or is it only used in certain situations (ie. keys retained in cache).

    thanks!

    • I think you would need it in situations where you need to reboot a host (or if the host crashes and comes back up), or if you wish to re-key for any reason. I don’t believe we constantly need to query the KMS.

  8. Hi Cormac,
    Can you confirm that the limitation of 1.6TB does not apply to caching disks but only for storage disk ? The cache size in vSan 6.6 remains 600GB ?
    Thanks.

    • I’m not sure about what the 1.6 TB limit refers to Philippe.

      However, yes, a disk group cache size is 600GB, although having larger sizes is also good as it will extend the lifetime of the cache device.

  9. The vmware website says “Customers can accelerate new hardware adoption with Day 1 support of the latest flash technologies, including solutions like the new Intel Optane 3D XPoint NVMe SSDs. In addition, vSAN now offers larger caching drive options, including 1.6TB flash drives, so that customers can take advantage of the latest and larger capacity flash drives.”
    http://www.vmware.com/products/whats-new-virtual-san.html

    Maybe they talk about new disk Intel Optane …. ?

    • Hmm – yeah, not 100% sure why they call that out. If you refer to the current cache guidance on John Nicholson’s blog post here, for heavy, write sequential workloads on vSAN, there is a recommendation to use 1.6TB total write cache. But it goes on to say that we should use 2 x 800GB cache device to achieve this. So, I’m not sure that the 1.6TB flash drive for caching refers to in the “what’s new”. Let me ask around Philippe.

    • OK – I read that text once more, and they seem to be confusing cache and capacity in the same sentence.

      Let me give you my take on it.

      In hybrid configs, this drive (if on the HCL) could be use for the caching tier, where 30% of the drive (500GB) would be used for write buffer and the rest (1.1TB) would be used for read cache. In AF vSAN, since the cache device is only used for write cache, I don’t think we would recommend using a device of this size. We would probably only recommend this size of device for the capacity tier. Make sense?

  10. Hello Cormac,

    Should we understand that because of the 600GB cache limit, we hence have to respect a capacity limit that the host offer in terms of vSan participation.
    i.e: 600gb = 10% –> Storage max = 6TB? per host/disk group?

    Thanks

    Ally

Comments are closed.