An overview of the new Virtual SAN 6.2 features
If you were wondering why my blogging has dropped off in recent months, wonder no more. I’ve been fully immersed in the next release of VSAN. Today VMware has just announced the launch of VSAN 6.2, the next version of VMware’s Virtual SAN product. It is almost 2.5 years since we launched the VSAN beta at VMworld 2013, and almost 2 years to the day since we officially GA’ed our first release of VSAN way back in March 2014. A lot has happened since then, with 3 distinct releases in that 2 year period (6.0, 6.1 and now 6.2). For me the product has matured significantly in that 2 year period, with 3,000 customers and lots of added features. VSAN 6.2 is the most significant release we have had since the initial launch.
The following is by no means a comprehensive list of all of the new VSAN 6.2 features, but these are the major features, along with a few other features that I feel might be of interest to readers. In my opinion, we now have a feature complete product, and a world-class hyper-converged solution for any application. Read on to learn about the new features that we have added to this latest and greatest version of Virtual SAN.
RAID-5/RAID-6 Support (aka erasure coding)
To date, objects deployed on the VSAN datastore could only be deployed with a RAID-1 (mirroring) configuration for availability. If one copy of the data was impacted by a failure, the virtual machine remained available since there was another copy of the data. However this has always meant that you had multiple copies of the data, consuming a lot of capacity. With the introduction of RAID-5/RAID-6 for objects, availability can be achieved with less capacity overhead.
RAID-5 allows objects to tolerate one failure, while RAID-6 allows objects to tolerate a double failure. For RAID-5, you need a minimum of 4 nodes in the cluster; for RAID-6, you need a minimum of 6 nodes in the cluster. RAID-5 requires 33% additional storage for parity (3+1), while RAID-6 requires and additional 50% for the double parity (4+2). This is a big improvement over the 200% and 300% required previously for objects to tolerate 1 and 2 failures respectively.
One thing that should be kept in mind is that, to date, we have gone with a design whereby core vSphere features such as DRS/vMotion and HA do not impact the performance of the virtual machine running on VSAN, should the VM be moved to another host in the VSAN cluster. In other words, we made a conscious decision not to do “data locality” in VSAN (apart from stretched clusters, where it makes perfect sense). This non-reliance on data locality lends itself to erasure coding, where the components of the VMDK must be spread across multiple disks and hosts. Put simply, a VM’s compute and all of a VM’s storage cannot co-exist on the same host with erasure coding; there has to be data access over the network at some point to a remote component of the data stripe. With VSAN, an administrator can continue to run core vSphere features such as DRS/vMotion and HA, and not impacting the performance of VM with erasure coding, no matter where the VM’s compute runs in the cluster.
RAID-5/RAID-6 erasure coding is only available on all-flash VSAN.
Space Efficiency via Deduplication and Compression
This is another space-saving technique that I know a lot of customers have been asking for. VSAN 6.2 now supports both deduplication and compression. This feature is enabled cluster wide, and is applied to all objects. Deduplication and Compression are both enabled together on VSAN. The scope of the deduplication and compression is the disk group. Deduplication works on a 4KB block, using the SHA1 cryptographic hash function. When the data is being destaged from the cache layer to the capacity tier in a particular disk group, it is hashed, and then VSAN checks to see if an identical block already exists. If there is a SHA1 match, then the block already exists, and there is no need to persist this block in the same disk group.
If the block does not already exist in the disk group, compression is next applied to the block before it is persisted on the capacity tier. The compression algorithm (LZ4) tries to bring the size of the block to 2KB or less. If successful, the compressed block is persisted in the capacity tier. If unsuccessful, the full 4KB block gets persisted.
Deduplication and compression are only available in all-flash VSAN.
Software Checksum
This is another highly anticipated feature. Many customers expressed a desire to ensure that there were no data integrity type problems in their storage environments. With the introduction of a new policy driven checksum mechanism, VSAN customers can decide whether or not to enable software checksum on their virtual machine data on a per object basis. When the virtual machine writes data, a checksum is added to it as it is written to storage. On subsequent reads, the checksum is read and once again checked. If there is an issue with the integrity of the data, the remote copy is fetched in the case of RAID-1 objects. If the read data is correct, it is re-written back to the replica that had the original bad block. In the case of RAID-5/6, VSAN reads the other data/parity in the same stripe to get the correct data, and writes back the good data.
Checksum errors are logged in the VMkernel.log file on the ESXi host that had the bad block.
There is also a scrubber which runs regularly on the persisted data to ensure that there is no data decay that has gone undetected.
Checksum is available on both hybrid and all-flash configurations of VSAN.
Performance Service
Many of you who are already familiar with VSAN will be aware of the VSAN Observer utility. While this was an excellent troubleshooting tool, it lacked certain critical features. In particular, It is cumbersome to use and it can only be launched via Ruby vSphere Console (RVC). Also, there are no historic statistics; it only gives a real-time view of VSAN. Another drawback is that there is way too much information displayed in VSAN Observer, including a lot of engineering level metrics that are of no use to an administrator. However the main criticism is that it is not built into the vSphere client. Well, in VSAN 6.2, we have a new Performance Service that gives you low-level VSAN information from the web client. The other nice thing is that the stats are stored on VSAN, so you don’t add any additional load on the vCenter Server.
IOPS limit for object
A number of customers have expressed a wish to limit the amount of I/O that a single VM can generate to a VSAN datastore. The main reason for this request is to prevent an erratic VM (or to be more precise, erratic application inside in a VM) from impacting other VMs running on the same datastore. With the introduction of IOPS Limits, implemented via policies, administrators can limit the number of IOPS that a VM can do. This is a nice quality of service feature for VSAN. Of course, there are other use cases too, such as allocating a certain amount of performance to a VM depending on what the end-user has requested/paid for.
IPv6
IPv6 is now supported on VSAN 6.2, both for implementations done over L2 and L3. Not more to add to this – it just works.
Capacity Views
Want to see what is consuming the space on a VSAN datastore? Want to see the overhead of the on-disk format, checksum or dedupe/compression is taking up? What about figuring out how much space is being consumed by the VM home namespace objects, swap objects or the actual virtual machine disk files? How much space is being used by replicas, or the distributed parity from RAID-5/6 objects? All of this is now included in a new capacity views section in VSAN 6.2.
Problematic Disk Handling Improvements
I wrote about this feature in VSAN 6.1. This is a feature designed to isolate problematic disks, and stop them impacting the cluster as a whole. It worked by unmounting the disk group with the problematic device. Improvements have been made in VSAN 6.2, and if the underlying issue (such as a transient error) is resolved, this feature will now attempt to remount the disk group. The point to make here is that you should always be using devices with drivers and firmware that have been certified to avoid issues of this nature. And this is really simply to check via the VSAN Heath Check.
Caveat: For those of you using home labs and nested environments where you can very quickly overwhelm the underlying storage with minimal workloads, you should definitely look at disabling this feature.
Deploying Thin Swap Objects
To date, VSAN always deployed the VM Swap Object with 100% Object Space Reservation, meaning space is set aside for the swap object whether or not you use it. When you have many VMs with large memory capacity (swap file is created to be the same size as memory), then this can consume a considerable amount of capacity on the VSAN datastore. This is especially true when you think that the VM may not even swap to disk if it is not under resource pressure. In VSAN 6.2, there is a new advanced configuration parameter, /VSAN/SwapThickProvisionDisabled, which make the swap objects provision as thin rather than 100% fully reserved. This needs to be set on every host in the cluster. A nice space-saving feature.
Stretched Cluster Improvements
I wrote a blog on the VMware vBlocks sites describing the behaviour of virtual machines when HA restarts them in a VSAN 6.1 stretched cluster environment. In a nutshell, there are situations where ghost VMs (that can no longer access their underlying storage) can be left behind on site A, even when they are restarted on site B. The blog post provides a reference to a KB article that helps remedy such situations. We provided a clean-up script with instructions on how to use it. We agree that this was not the most elegant of solutions.
However for VSAN 6.2, the VSAN and HA teams have worked together to provide a solution so that this issue no longer occurs, and ghost VMs left on the failed site are safely eradicated. This is really good to see, and will make the whole process of site failure in a stretched cluster environment much easier to manage for vSphere administrators.
Simplified deployment for use cases – ROBO, Fault Domains, Stretched Clusters
We received feedback in the past that the configuration process for certain use cases was not at all intuitive. The team has worked very hard on this feature so that it now is far simpler to configure VSAN for certain use cases from the get-go. Feature such as Fault Domains, 2-node VSAN and VSAN stretched cluster now have new wizards to make the whole setup process very straight-forward. The goal of keeping VSAN “radically simple” is still at the heart of VSAN development.
I’m delighted to see this release. What a great job the team has done in such a short length of time. This 6.2 version has so many features requests from customers, taken from meetings at VMworld, during VMUGs and even online via twitter and other social media platforms. As I said in the introduction, this version of VSAN is feature complete in my opinion. While we can continue to add new data services over time, the economics of running an all-flash VSAN when coupled with services like deduplication, compression and erasure coding make all-flash VSAN an option for any use case.
I believe 2016 is going to be a great year for Virtual SAN. Tune in regularly as I will be doing deeper dives on many of these features over the coming weeks. Go VSAN!
It’s great to see these new features but I’m sad to see that encryption and dedupe are configured cluster-wide. Surely this goes against the whole ethos of VSAN. These should be features made available at the Storage Policy level so that individual VMs and virtual disks can be configured to use them as required.
The way that these two features have been implemented in 6.2 means configuring them for the entire datastore, something that VSAN is supposed to getting away from. I hope this will be fixed in the following release! Until then, customers will need multiple VSAN clusters – a regular one and a dedupe/encrypted one. I guess the same is true for hybrid and all-flash. Let us have one datastore where these features are selectable via Storage Policies. 🙂
I’m not quite sure why or even how you could configure a per VM dedupe policy? If you don’t consider the whole cluster, how would you determine what objects within the VSAN file system could be deduped? If anything this would be more complex and demanding on the environment. Great article Cormac and congrats on the release!
A very realistic scenario is that certain VMs are so important for a company that the attached storage policy is defined with raid1, high stripe ratio and no compression or dedupe to maximize performance with the least amount of overhead in handling the IO. While the lower tier VMs have a policy with erasure coding and dedupe and compression to maximize those SSD gigabytes.
Certainly not disputing the option to define policy driven raid levels at a granular level. Focussing specifically on the query around dedupe though, given that this happens when data is destaged from the caching tier to the capacity tier, in a converged solution such as this, even if you could dedupe on a per VM basis, other VMs would keep the caching tier busy performing dedupe work so it wouldn’t necessarily produce the desired effect of improving performance for those that were exempted were it a possibility. Its my understanding that you can define dedupe on a disk group level so if it was a requirement to limit any potential performance impact of dedupe, logical boundaries could be implemented that way.
Great article Cormac! Have there been any studies on the CPU and memory overhead of using erasure coding, dedup and compression?
Is it safe to say that dedup and compression are done at inline, near inline or something else?
Thanks Tom. Yes – my understanding is that we will have a performance study, but I do not have an ETA.
Since dedupe/compression are done when destaging from cache to capacity, we are calling it nearline. I’ll have a more detailed post on this shortly.
Is there any word on how this new version will be implemented? Will it be an update release or major version release such as 6.5? Is there a time frame?
Its included with the next release of vSphere, i.e. 6.0U2.
If you are already using VSAN and wish to upgrade, you will be able to upgrade the VC and ESXi in the same way as before, e.g. using VUM or esxcli
The final part, which is VSAN specific, is an upgrade of the on-disk format to get the new features. This is the same mechanism that upgraded the on-disk format when moving from version 5.5 to 6.0.
All of the above can be done as a rolling upgrade with no impact to running VMs.
Hi Cormac,
About that FS 2.0 to 3.0 Upgrade with no impact to running VMs…
We have a 6 node cluster. We tried for more than a day to get this to work (after upgrading vCenter and all 6 hosts to U2) and we always ran into locking issues with running vm’s. Only when a VM was powered off, was the lock gone. As soon as the VM started, a lock was back and the upgrade refused to work because of these locks. Kind of like a dog chasing it’s own tail.
We used RVC to establish that all the component ID’s that the “upgraded-failed-message” talked about (a very long list…) where normal VMDKs. So no AppVolumes stuff or left-overs lying around. It found that all the VMDK’s (every VMDK of every VM) was causing a lock issue and refused the update as a consequence.
We never got out of this vicious circle. In the end, we attached a NAS via NFS, storage-migrated the vCenter VM to the NAS because it’s own presence as a running VM caused a lock issue (with itself).
Then we powered off all other VM’s, which made the locks go away (and did not start them to avoid a new lock). Only then did the upgrade run because nothing was locked anymore.
The upgrade ran almost 48 hours for a 6 node cluster with 4x 600 10k SAS drives each (capacity tier). We were down for the entire weekend.
To answer a question I can see coming from a mile away (don’t want to be rude): We did not open a support-ticket. This is a Dev-cluster and not production. We suffered no dataloss and no actual production impact means “lowest of the lowest priority” and we effectively won’t get any support or only from someone with very little expertise. Opening a case for this environment is useless really. I’m not being negative persé. I just know how it works after being in the VMware game for so long.
Sorry to hear that Steven. It looks like there are 2 upgrade issues. The first is associated objects where an rm -r may have been run against files and folders on the VSAN datastore. This leaves the VMDK objects stranded. Since we won’t ever delete an object automatically, we need admins to either recreate the objects or remove them completely.
There is a second issue, and this seems to be related to CBT – Change Block Tracking. This is not specific to any application (AppVolumes, View, VCD). But if you are backing up or replicating the VMs using CBT, these get locked. We are working out the best way to deal with this automatically (probably KB article plus an attached script to automate the handling). This will mitigate the manual effort that you had to go through. We’ll then get a patch out to take care of this automatically. I’ll provide an update as soon as I know more.
CBT ? Hmmm. We used to use CBT with Veeam. Until the CBT bug that was in pre ExpressPatch 4 / U1b. We stopping Veeam from using CBT all together but there are still a lot of CBT files laying around.
The first issue was true for one stranded file after a storage-vMotion off the vsanDatastore to a NAS. But that we found and cleared up quickly.
So i’m betting my money on your CBT suggestion.
Any plans to make dedup, erasure coding and compression available for hybrid configurations of VSAN? If No, whats the technical reason?
btw, great article. Very well written.
Thanks Tim. This is just my opinion why we only introduced these space saving techniques for all-flash (with the caveat that I had no involvement or input into this decision).
a) We’re seeing most traction with all-flash VSAN, so this is where the focus lies
b) These techniques on all-flash make the economics of using all-flash as appealing as hybrid
c) When there is a failure and a rebuild operation with these space saving features, theoretically there could be a considerable amount of I/O amplification when these features are used. While all-flash can handle this very well, hybrid based on spinning disk may not
Hope this help
Yep, definitely helps. Thanks Cormac!
c) (beside a) and b) from a marketers view) makes sense to me, although I hope it’s the “theoretically” in a way like: I/O could bring hybrid down, but nobody knows for now. So I hope to see VSAN 6.3 with those features available for hybrid, after many installations of VSAN 6.2 in production happened and some “learning” took some efects 🙂
All the best,
Tim
Reading deduped data from spinning disks also creates LOTS of IOPS as the system reassembles data from blocks spread across the storage pool. Many hybrid arrays dedupe in the flash layer but not on the spinning disks to avoid exacerbating the performance impact of a flash miss.
Happy to see so many new features in VSAN 6.2!
Not happy that dedup and erasue coding is only available for all-flash, since we deliver a lot of hybrid VSANs to our customers.
Any information when vSphere 6.0 Update 2 will be available?
I don’t believe the GA date has been announced yet.
Cormac,
I’m curious as to whether you can enable compression without deduplication?
Thanks.
Nope – both are enabled together in VSAN 6.2
Do IOPS limits only kick in when the system becomes resource constrained or are they a hard limit?
Hard limit Tom – will have a separate post on it soon.
Can you confirm if new hardware will be supported in this release and when the HCL will be publicly available?
I can’t talk about those plans, but they should be some information at the time of, or soon after GA I would think.
Having problems with updating the ondisk format of a three node hybrid vsan cluster. Searching for ressources on the web, but can only find information for update to version 2.
Are you experiencing something like this – https://communities.vmware.com/thread/532672 ?
Solution seems to be (still running):
Getting an up-to-date rvc and run
vsan.ondisk_upgrade -a