vSAN
A quick reference to vSAN content
vSAN is VMware’s Hyper-converged Infrastructure (HCI) platform, offering both compute and distributed storage in a single solution.
Books
- vSAN 8.0 U1 Express Storage Architecture book – paper / kindle (released 25 April 2023)
- vSAN 7.0 U3 Deep Dive book – paper / kindle (released 09 May 2022)
- A link to the vSAN 6.7 U1 Deep Dive book
Posts
- New Book: VMware vSAN 8.0U1 Express Storage Architecture now available
- Dynamic RWX volumes now supported in TKC in vSphere with Tanzu
- vSAN Data Persistence platform (DPp) Revisited
- Announcement! vSAN Deep Dive book updated for 7.0 U3
- vSAN File Service and K8s PV quota (Video)
- Adding network permissions to K8s PVs backed by vSAN File Share
- vSAN File Service and K8s PVs with implicit quota
- AND and OR rules in storage policies (Video)
- AND and OR rules in storage policies
- vSAN 7.0U2 – What’s new?
- vSAN DPp – MinIO Object Store Supervisor Service [Video]
- vSAN 7.0U1 – Object Format Health Warning after Disk Format v13 upgrade
- vSAN 7.0U1 – File Service SMB support
- vSAN 7.0U1 – Persistent Volume Placement in HCI-Mesh
- vSAN 7.0U1 – Capacity Management
- vSAN 7.0U1 – What’s new?
- vSAN 7 – File Services and Cloud Native Storage Integration (Video)
- vSAN 7 – Using Velero to backup/restore File Services RWX volumes
- vSAN 7 – Read-Write-Many Persistent Volume with File Services
- vSAN 7 – Track vSAN Memory Consumption
- vSAN 7 – Native File Services
- A holiday promo for the vSAN 6.7 U1 Deep Dive book
- vSAN 6.7U3 – New Advanced Options
- vSAN 6.7U3 – What’s new?
- vSAN 6.7U1 – Deep Dive book now available in traditional Chinese
- Celebrating 20,000 #vSAN customers – thank you
- Degraded Device Handling (DDH) Revisited
- vSAN Erasure Coding Failure Handling
- vSAN 6.7U1 – Deep Dive book now available
- vSAN 6.7U1 – Capacity History – Unable to query charts data
- vSAN 6.7U1 – New Advanced Options
- A closer look at EBS-backed (Elastic) vSAN
- Change policy on a vSAN object via RVC
- vSAN 6.7U1 – What’s new?
- vSAN 6.7 – Stretched Cluster and iSCSI
- VMworld 2018 vSAN roundup – Monday, August 27th
- vSAN 6.7 – Stretched Cluster and Horizon View interop update
- Using PowerCLI SPBM cmdlets to create VMs with storage policies
- Why upgrade vSAN? Here is a list of features, release by release
- Minio S3 object store deployed as a set of VMs on vSAN
- vSAN 6.7 – A deeper dive into Fault Tolerant VMs
- vSAN Performance Evaluation Checklist now available
- vSAN 6.7 – What’s new?
- Hyper-Converged Infrastructure (HCI), Sustainability and Green IT
- Which policy changes can trigger a rebuild on vSAN?
- A new vSAN training class – vSAN Production Operations
- Taking snapshots on vSAN when there are failures in the cluster
- A closer look at Scality S3 running on vSAN
- A closer look at Minio S3 running on vSAN
- 2-node vSAN – witness network design considerations
- Supporting Fault Tolerance VMs on vSAN Stretched Cluster
- How many hosts are needed to implement SFTT on vSAN Stretched Cluster?
- New vSAN Stretched Cluster topology now supported
- vSAN 6.6.1 – View 7.1 interop – nice fix
- vSAN 6.6.1 – Some nice new features
- Does enabling encryption on vSAN require an on-disk format change?
- Deploying vSAN with Photon Platform v1.2
- Using tags with Storage Policy Based Management
- vSAN 6.6 – Config Asst incorrectly reports physical NIC warnings with LACP/LAG
- Cloning and Snapshots on vSAN when policy cannot be met
- vSAN and Predictive DRS, Network-aware DRS and Proactive HA
- Why vSAN cannot support cross-witness hosting in stretched clusters
- vSAN 6.6 – What’s new?
- Debunking some behavior “myths” in 3-node vSAN
- Sizing for large VMDKs on vSAN
- 2-node vSAN topology review
- Getting more out of vSAN – webcast series coming to EMEA
- vSphere 6.0 U3 – important update for vSAN
- Another recovery from multiple failures in vSAN stretched cluster
- Understanding recovery from multiple failures in vSAN stretched cluster
- The continued rise of HCI, and especially vSAN
- New management pack for vRealize Operations
- vSAN Stretched Cluster – Partition Behavior Changes
- vSAN 6.5 – Extending an ESXi diagnostic core dump partition
- Storage Challenges with VMware Cloud Native Apps (video)
- Deploying Kubernetes manually on Photon Controller v1.1 and vSAN
- Photon Controller v1.1 and vSAN
- PowerCLI 6.5 Release 1 and vSAN
- vSAN 6.5 – What’s new?
- Docker Volume Driver for vSphere using policies on VSAN (short video)
- My VMworld 2016 presentations on VSAN are now available
- Docker Volume Driver for vSphere on Virtual SAN
- VSAN 6.2 now in the hands-on-labs
- Recovering from a full VSAN datastore
- VSAN 6.2 Part 12 – VSAN 6.1 to 6.2 Upgrade Steps
- VSAN 6.2 Upgrade Fails to Realign Objects
- VSAN 6.2 Part 11 – Support for larger VMDKs with higher FTT
- Component Metadata Health – Locating Problematic Disk
- VSAN 6.2 Part 10 – Problematic Disk Handling Improvements
- VSAN 6.2 Part 9 – Replacing the witness appliance
- VSAN 6.2 Part 8 – Upgrading stretched cluster to 6.2
- VSAN 6.2 Part 7 – Capacity Views
- VSAN 6.2 Part 6 – Performance Service
- VSAN 6.2 Part 5 – New Sparse VM Swap object
- VSAN 6.2 Part 4 – IOPS limit for object
- VSAN 6.2 Part 3 – Software Checksum
- VSAN 6.2 Part 2 – RAID-5 and RAID-6 (Erasure Coding)
- VSAN 6.2 Part 1 – Deduplication and Compression
- An overview of the new Virtual SAN 6.2 features
- VSAN Stretched Cluster – some possible warnings
- VSAN.ClomMaxComponentSize explained
- A new vRealize Log Insight Content Pack for VSAN
- VSAN 6.x – Design and Sizing – Memory Overhead
- VSAN 6.1 – SMP-FT support on Virtual SAN
- VSAN resync behaviour when failed component recovers
- Common VSAN health check issues and resolutions
- Proactive Re-balance not starting
- VSAN 6.1 – DRS and VM/Host affinity rules in VSAN stretched clusters
- VSAN 6.1 – Read Locality in VSAN stretched clusters
- Getting started with HCIbench, a benchmark for hyper-converged infras
- My VMworld 2015 session – VSAN Monitoring and Troubleshooting
- VSAN 6.1 – New Feature: Problematic Disk Handling
- VSAN 6.1 – vSphere HA settings for VSAN stretched cluster
- VSAN 6.1 – Step-by-step deployment of the VSAN witness appliance
- VSAN 6.1 – A closer look at the VSAN witness appliance
- VSAN 6.1 – Supported network topologies for VSAN Stretched Cluster
- My VMworld 2015 session – VSAN Proof-Of-Concept (YouTube video)
- VSAN 6.next beta – A glimpse of the future
- VSAN 6.1 – A brief overview of the new Virtual SAN 6.1
- Handling VSAN traces when ESXi host boots from a flash device
- Is VSAN for you? It’s never been easier to check…
- VSAN Health Check 6.0 Patch 1 Announcement
- SAS expander support on VSAN
- Migrating a VM with snapshots to/from VSAN
- Using NexentaConnect for file shares on VSAN
- Using HyTrust Data Control to encrypt VSAN disks
- vROps Management Pack for VSAN – now in beta
- Heads Up! View 6.1 and All Flash VSAN deployment issues
- VSAN 6.0 Part 10 – 10% cache recommendation for AF-VSAN
- Announcing the Virtual SAN 6.0 Health Check Plugin
- VSAN 6.0 Part 9 – Proactive Re-balancing
- VSAN 6.0 Part 8 – Fault Domains
- VSAN 6.0 Part 7 – Blinking the blinking disk LEDs
- VSAN 6.0 Part 6 – Maintenance Mode behaviour changes
- VSAN 6.0 Part 5 – vsanSparse format snapshots
- VSAN 6.0 Part 4 – All Flash Capacity Tier Considerations
- VSAN 6.0 Part 3 – New Default Datastore Policy
- VSAN 6.0 Part 2 – On-disk format upgrade considerations
- VSAN 6.0 Part 1 – New quorum mechanism
- ESXi 5.5 EP6 is now live. Important patch for VSAN users
- A brief overview of VSAN 6.0 new features and functionality
- VSAN considerations when booting from SD/USB device
- VSAN 5.5 Part 35 – Considerations when changing policy dynamically
- VSAN 5.5 Part 34 – how many disks are needed for stripe width
- VSAN 5.5 Part 33 – Some common misconceptions explained
- VSAN 5.5 Part 32 – Datastore capacity not adding up
- VSAN 5.5 Part 31 – Object compliance and operational status
- VSAN 5.5 Part 30 – Difference between Absent and Degraded
- My VSAN session from the 2014 Nordics VMUG (YouTube)
- Tips for a successful VSAN 5.5 Proof of Concept
- vsan.resync_dashboard only shows VM resyncs, not templates
- VSAN 5.5 and OEM ESXi ISO images
- Heads Up! Incorrect reporting of Outstanding IO
- VSAN 5.5 Part 29 – Cannot complete file creation operation
- VSAN 5.5 Troubleshooting Case Study
- VSAN 5.5 Part 28 – RVC login difficulties
- VSAN 5.5 Part 27 – VM memory snapshot considerations
- VSAN 5.5 Part 26 – Does disk size matter?
- Heads Up! VASA storage providers disconnected – VSAN capabilities missing
- VSAN 5.5 Part 25 – How many hosts do I need to tolerate failures?
- VSAN 5.5 Part 24 – Why is VSAN deploying thick disks?
- VSAN 5.5 Part 23 – Why is my Storage Object Striped?
- VSAN 5.5 Part 22 – Policy Compliance Status
- VSAN 5.5 Part 21 – What is a witness?
- VSAN 5.5 Part 20 – VM Swap and VM Storage Policies
- VSAN 5.5 and vCenter Operations Manager interoperability
- VSAN 5.5 and vSphere Replication interoperability
- VSAN 5.5 and vSphere Data Protection interoperability
- VSAN 5.5 and Horizon View Interoperability
- VSAN 5.5 Part 19 – Common Configuration Gotchas
- VSAN 5.5 Part 18 – VM Home Namespace and VM Storage Policies
- Getting started with Fusion-io and VSAN 5.5
- VSAN 5.5 Announcement Review
- VSAN 5.5 Part 17 – Removing a disk group from a host
- VSAN 5.5 Part 16 – Reclaiming disks for other uses
- VSAN 5.5 Part 15 – Multicast Requirement – Misconfiguration detected
- VSAN 5.5 Part 14 – Host Memory Requirements
- VSAN 5.5 Part 13 – Examining the .vswp object
- VSAN 5.5 Part 12 – SPBM Extensions in RVC
- VSAN 5.5 Part 11 – Shutting down the VSAN Cluster
- VSAN 5.5 Part 10 – Changing VM Storage Policy on-the-fly
- VSAN 5.5 Part 9 – Host Failure Scenarios and vSphere HA interop
- VSAN 5.5 Part 8 – The role of the SSD
- VSAN 5.5 Part 7 – Capabilities and VM Storage Policies
- VSAN 5.5 Part 6 – Manual or Automatic Mode
- VSAN 5.5 Part 5 – The role of VASA
- VSAN 5.5 Part 4 – Understanding Objects and Components
- VSAN 5.5 Part 3 – It is not a Virtual Storage Appliance
- VSAN 5.5 Part 2 – What do you need to get started?
- VSAN 5.5 Part 1 – A first look at VSAN
Can you give us some detail on calculating Disk Yield? If I have 3 modes with 1tb, will I see 3 tb storage? does a VM that uses 50gb of storage take up 50gb, or 100 gb, or 150 gb?
There should be a sizing guide going live shortly, but all magnetic disks across all the hosts will contribute to the size of the VSAN datastore. The SSDs (or flash devices) do not contribute to capacity. So if you had 1TB of magnetic disk in 3 nodes, your VSAN datastore will be 3TB.
The amount of disk consumed by your VM is based primarily on the failures to tolerate (FTT) setting in the VM Storage Policy. An FTT of 1 implies 2 replicas/mirrors of the VMDK. Therefore a 50GB VMDK created on a VM with an FTT=1 will consume 100GB. A 50GB VMDK created on a VM with an FTT=2 will make 3 replicas/mirrors and therefore consumes 150GB. Hope that makes sense. Lots of documentation coming around this.
Thanks Cormac. This is what I assumed, and wanted to check. Look forward to the documentation.
Hi Comac,
Need to understand on the “Note” of VSAN Part 9 topic:
On the vSphere HA interop:
….”Note however that if VSAN hosts also have access to shared storage, either VMFS or NFS, then these datastores may still be used for vSphere HA heartbeats”
Questions:
If for example all the VSAN hosts also have VMFS shared datastore(s) (say using FC SAN), then I can have TWO kind of HA protections which are if the VM located on the VSAN datastore then it gets VSAN HA protection and if the VM located on the shared VMFS datastore then it gets a traditional HA protection?
Thanks
Just to clarify on the whole disk consumption based on the FTT setting…going back to your example of a FTT=1 for a 50GB VM….
Are you saying that it will consume an additional 100GB of space due to the 2 replicas created?…or are you saying that the original VM (VMDK) that is created is counted as one of those replicas?
[quote]
“therefore a 50GB VMDK created on a VM with an FTT=1 will consume 100GB”
In regards to being completely clear, would it be better to say
will consume an extra 100GB in addition to the 50GB VM (VMDK)”?
I’ve done countless days and days of researching for the past ~6 months or so but every time I hear that, it throws me off on my understanding of FTT > disk consumption.
Thank you in advance for your time, if you choose to respond.
*I read your book BTW, you and Duncan Epping are rockstars in the world of virtualization….really good read. Couldn’t have asked for more.
-JamesM
It means that 2 x 50GB replicas are created for that VMDK James, meaning 100GB in total is consumed on the VSAN datastore (not an additional 100GB). Note however that VMDK are created as thin provisioned on the VSAN datastore, so it won’t consume all of that space immediately, but over time.
Thanks for the kind words on the book – always nice to hear feedback like that.
Thanks for the reply and clarification…so to make sure I get this right, there will be a single VMDK for the actual VM running in the environment BUT since VSAN is in use, if your FTT=1, then 100GB will be consumed by the 2 replicas that are created (over time with thin provisioning).
I think my confusion is in the semantics of how every everyone explains it.
Yep – you got it. A single 50GB VMDK, made up of two 50GB mirrors/replicas, each replica sitting on a different disks (and host) but the same datastore and eventually consuming 100GB in total on the VSAN datastore
I have a question for you regarding Part 13 in which you refer to “the VM swap file” and the “swap object”. How does the vmx-*.vswp file fit into all this? This file was introduced in 5.0. Does this file belong in the swap object? Is there a second swap object for it? Or does it simply belong to the VM namespace object?
Yes – this is what we are referring to. This is now instantiated as its own object on the VSAN datastore, and does not consumes space in the VM namespace object.
Hi Cormac,
A question about the “Virtual SAN 6.0 Design and Sizing Guide”. On page 46 it states ‘For hybrid configurations, this setting defines how much read flash capacity should be reserved for a storage object. It is specified as a percentage of the logical size of the virtual machine disk object.’. So a percentage of the logical size (used storage). The example on page 47 takes the flash read cache reservation as a percentage of the physical space (allocated storage). What is the truth?
Thanks.
These statements are meant to reflect the same thing Stevin. When I say that it is a “percentage of the logical size”, this is not the same as “used storage”.
All VMDKs on VSAN are thin by default. they can be pre-allocated (made thick) through the use of the Object Space Reservation capability-.
However, whether you use that or not, you request a VMDK size during provisioning, e.g. 40GB. Now you may only use a portion of this, e.g. 20GB, as it is thin provisioned.
But Read Cache is based on a % of the requested size (logical size/allocate storage), so 40GB. Hopefully that makes sense.
Cormac
Hi Cormac, perfectly clear thanks!
Hi Cormac,
regarding your book Essential VSAN, excellent book btw. The book states: In the initial version of VSAN, there is no proportional share mechanism for this resource when multiple VMs are consuming read cache, so every VM consuming read cache will share it equally. how must i read this? will the total flash read cache size be devided by the number of VMs consuming VSAN storage and that is the amount of flash read cache each VM gets? (this would be a problem for read intensive VMs with more storage than average)
What about the write cache? Every write has to go through the write cache i presume? How is write cache shared between VMs?
thanks again.
Hi Cormac
I would be very interested to know about read and write cache allocation to VM’s when reservation is set to 0 for VSAN 6.2
If I copy a large file from :C to D: drive in my windows VM, I see very poor transfer rates by comparison to the same copy on a PC (less than half the speed). The transfer rate drops to zero for up to 7 seconds for periods during the transfer. Its almost like its cache allocation has filled up and its waiting for destage to complete.
Thanks
Hi Karl,
extremely difficult to figure this out without getting logs, etc. I would recommend opening a call with support.
However there were some significant bug fixes in the most recent patch – VMware ESXi 6.0, Patch Release ESXi600-201611001 (VMware ESXi 6.0 Patch 04). Are you running this?
Thanks Cormac. We will be applying the latest patch. I also have a job logged
Hi Cormac
A question about the ratio for SSD and HDD numbers. what’s the best number for the ratio? From the system level view, If only one HDD I belived the performance will not good( your data will gating on one HDD interface). but if the HDD disk is around 10, The SSD couldn’t provide enough cache to all. Just wonder if there’s a perfect ratio?
It is completely dependent on the VMs that you deploy. If you have very I/O intensive VMs each with large working sets (data is in a state of change), then you will need a large SSD:HDD ratio. If you have very low I/O VMs with quite small working sets, you can get away with a smaller SSD:HDD capacity. Since it is difficult to state which is the best for every customer, we have used a 10% rule-of-thumb to cover most virtualized application workloads.
Appreciate Cormac, understand the ratio will determine the performance. And user configure it with their application case. it provide flexibly to users.
I may didn’t make it clear.
The ratio here I mentioned is physical device number not capacity number.
Or the Performance has no relationship with physical devices number ratio, only affected by SSD:HDD capacity ratio?
This is one of those depends answers Lyne.
If all of your writes are hitting the cache layer, and all of your reads are also satisfied by the cache layer, and destaging from flash to disk is working well, then 1:1 ratio will work just fine.
If however you have read cache misses that need to be serviced from HDD, or there is a large amount of writes in flash that need to be regularly destaged from flash to HDD, then you will find that a larger ratio, and the use of striping across multiple HDDs for your virtual machine objects can give better performance.
Yes, Cormac.
That’s my concern. we’re struggling with 1SSD:4 HDD and 1SSD:5HDD performance difference.
I think if we have big SSD, it should have less possibility to miss cache.
Even the cache missed, the 4HDD and 5HDD shouldn’t affect big right?
Maybe need to set up environment and collect some test data. 🙂
hi Cormac
I run VSAN with 3 Host’s
and also config LACP between server and switch
SSD Samsung pro 850 with 512GB of size
after run I tested copy speed between 2 VM that run in VSAN and the speed is between 20MB to 60 MB
What is my problem
Notice in Smart Storage Administrative I disabled Caching from SAS and SSD disk, then test speed and speed was very bad
again delete arrays and create with cached enable and again speed was very bad
Please help
Please open an SR (support request) with GSS (VMware’s Global Support Services) Morteza. They can advise you.
Hi Morteza,
Noticed you’re using the same Samsung consumer grade SSDs that I thought would work. Are you using them as the caching tier or as the capacity drives? In my case, I used them as the caching tier and had all sorts of issues, even down to Permanent Disk Loss errors randomly appearing, requiring a host reboot. I’ve since moved them to the capacity tier and put in some Enterprise SSDs and so far, haven’t had any further issues.
Thanks
Andrew
I posted this on the VM/Host affinity groups, but didn’t get a reply. I’m looking at setting up a VSAN stretched cluster. Can you help answer this?
——–
How does VM/Host affinity groups work with fault domains? I’m looking at setting up a VSAN, and setting the fault domain for site A to be site B. As I understand, by doing that, when I set FTT=1, the data will be replicated to site B instead of to another node at site A. This is to cover the case where we lose the entire rack at Site A. The VMs will be able to reboot at Site B off of the replicated data at Site B.
If I were to use VM/Host affinity groups, then wouldn’t I need to replicate to a second node at site A? Would that mean setting FTT=2, and it would replicate to a node at site A, and a node at site B? Maybe VM/Host affinity groups don’t work when using fault domains. Can you help me sort that out?
First, VSAN Stretched Cluster only supports FTT=1. Fault Domains and FTT work together.
If you have a failure on site A, the VM/Host Affinity rules will attempt to restart the VM on the same site, i.e. site A.
If you have a complete site failure (e.g. lost power on site A), the VM/Host affinity rules will then attempt to restart the VM on the remote site, i.e. site B.
You still need to use fault Domain with Stretched Cluster, but simply as a way of grouping hosts on each site together.
This should be well documented in the stretched cluster guide. There is also a PoC guide due to be released very soon which will provide you with further detail.
Thanks for your reply.
So if a stretched cluster has FTT=1, then doesn’t that mean it will only replicate data to another node at site B? If it only replicates to another node at site B, and a node at site A goes down, how will VM/HA rules be able to restart the VM on the same site A?
That is why we recommend “should” rules rather than “must” rules got affinity. You can find the full details in the stretched cluster guide here.
Hi,
Can you say if removing this limitation of Stretched Cluster (FTT only =1) is on the roadmap? We are looking at implementing it but would like to have 2 copies on the primary site + 1 on the secondary (or maybe 2+2 active-active configuration)
Thanks, Vjeran
We are definitely looking at this, and the plan is to improve up on it. But there are no dates for the feature that I can share with you I’m afraid.
Thanks for the response, good to hear that you are working on it
This didn’t make it in 6.2 release? Sometimes those details don’t get advertised at launch, so I’m still hoping.. 😉
About the Health Check plugin, any thoughts on why it triggers the alarm ‘Site Latency Health’ between host and witness on as low as 15ms when less or equal to 100ms is the recommended figure? is there any way to tweak this?
I’m not aware of any way to stop this Andreas, but please check with GSS/Support. They may know of a solution.
Please see this article. This may be fixed in the next patch. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2146133
Hi Cormac, I read your book BTW, you and Duncan Epping are really good in the world of virtualization….really good read. you have expertise in virtualization Couldn’t have asked for more
Hi Cormac,
I found that some posts have a reply box, but some do not have.
I have read the post about “VSAN 6.2 Part 1 – Deduplication and Compression” and wanted to leave reply there, but it seems that there is no space…
How can I leave a reply to that post?
Posts are closed for comments after a certain period of time.
OK, got it.
So I post my questions about “Deduplication and Compression” here as the last option.
My environment is as following:
1. 3 All Flash ESXi hosts with Dedep and Compression enabled.
2. Only PSC, VCSA and other 2 VMs have been deployed with less then 1TB totally.
3. The object space reservation is 0% with default VSAN Storage policy.
But from what i saw is that:
1. The deduplication and compression overhead is 6.5TB.
2. The ‘used-total’ grows up to about 2 TB after enable Dedup and compression.
Is that the normal phenomenon after enable the feature? BTW, is there any formula that i can use to calculate the expected consumed capacity?
Thanks.
Yes – the overhead is calculated up front. In my experience, Deduplication and Compression Overhead approx. 5% and CheckSum overhead is anywhere between 1.22 – 1.25. There are some more details here: http://cormachogan.com/2016/02/25/vsan-6-2-part-7-capacity-views/
Yes, 5% of the total raw capacity is true. I also found the same description in VMware document center.
Thanks you so much.
Hi Cormac,
vSAN is leveraging the new vsanSparse snapshot technology. Does this new snapshot technology also reduce the stunning time during removal of a large snapshot compared to traditional “redo log” snapshots? I didn’t find any comments in the vsan snapshot performance white paper about this.
I think the main difference is the in-memory cache and the granularity that vSANsparse uses – otherwise the techniques are quite similar. However I am not aware of any study to measure the differences. This might have further useful info -https://storagehub.vmware.com/#!/vmware-vsan/vsansparse-tech-note
Hi Cormac,
I wanted to run a vSAN maintenance scenario by you to see if there are any potential drawbacks, aside from a node failing while performing the maintenance. This is regarding ‘Ensure availability’ and ‘No data migration’ maintenance modes.
Scenario:
A single node in a 4 node vSAN cluster is placed into maintenance mode using the ‘No data migration’ method. Once in maintenance mode, software updates/firmware is applied to the node and it’s unavailable for roughly 30-40 minutes. After the maintenance is completed the node is placed back into production and the administrator immediately moves onto the next node in the cluster to be patched. The admin again uses the same ‘No data migration’ maintenance mode on this node, applies updates for 30-40 minutes and so on. These steps are repeated for the remaining nodes.
Cluster details
vSAN version: 6.2
Hosts in Cluster: 4
Storage Policy on all VMs: FTT=1
Fault Domains: Single FD per host
Disk Configuration: Hybrid
Question:
If the admin is performing maintenance this way without waiting for components to re-sync after each 30-40 minute window and is not using ‘Ensure availability’, would there be potential data issues or a chance of VMs becoming unavailable as a result? This is again without a node failing in the cluster during these maintenance windows. I understand this is not the preferred way of doing maintenance, but I was just curious what could happen and if there were any fail-safes when this occurs.
You definitely need to be careful with this approach. First, you might like to increase the cmmds repair delay timeout value above the 60 minute value (see KB 2075456). This gives you a bit more lee-way, in case it take a bit longer to apply the fw and reboot the host. It will mean that rebuild won’t start if the maintenance runs over 1 hour..
Now there may well be some changes that need to be synced once the host has rebooted. You need to wait for this to complete before starting maintenance on the next host. I like to use RVC commands for this such as rvc.resync_dashboard (you can also use the UI). Only commence work on the next host when you are sure that all objects are fully synced and active.
HTH
Cormac
Hi Cormac,
Can you plz reply on my query “What will happen when the whole environment goes down and power back on again ? Do we run some sort of integrity check ?”
Regards,
Sai
Hi Cormac,
I have a question about vSAN could you please explain more detail :
I have a vSAN Cluster with 3 ESXi host (1SSD 50Gb and 1HDD 300Gb per host), VM storage policy is : Number of Failures to Tolerate = 1, Number of Disk Stripes per Object = 1
If i have a VM with a Virtual disk size is 400Gb, what happen and how vSAN stored/distributed the VM. It cannot stored 2 replica on 2 Host because only 300Gb HDD per host, is it correct ?
Thanks you so much
Correct – you will not be able to provision this VM with that policy, unless you override the policy with a ForceProvision entry. With ForceProvision, it means the VM will be provisioned as an FTT=0, so there will be no protection.
Thanks Cormac for your information. So, if i set Number of Failures to Tolerate = 1, Number of Disk Stripes per Object = 2 is it ok for this case
No – Disk Stripes require unique capacity devices, and you do not have enough devices to accommodate this, as you only have one capacity device per host. At most, you can get get FTT=1 with SW=1. These requirements are called out in the product documentation, and in the vSAN deep dive book.
Hello Cormac
I’m reading the Stretched cluster guide and do not follow the component bifurcation in the BW calculation section.
____
200 virtual machines with 500GB vmdks (12 components each) using PrevSAN 6.6 policies would require 4.8Mbps of bandwidth to the Witness
host
3 for swap, 3 for VM home space, 6 for vmdks – 12
12 components X 200 VMs – 2,400 components
2Mbps for every 1000 is 2.4 X 2Mbps – 4.8Mbp
____
In this example PFTT-1 and SFTT-0
Component Calculations for VMDK –
SiteA – 500GB – Component0-255GB and Component1-245GB – 2 components
SiteB – 500GB – Component0-255GB and Component1-245GB – 2 components
Either SiteA or SiteB will also have 2 additional Witnesses, one for component0 and the other for component1 – 2 components
Witness site – 1 for component0 and other for component1 – 2components
The above gives us a total of 8 components for VMDK of 500GB – Why do I get 2 addional in the count?
____
The same 200 virtual machines with 500GB vmdks using vSAN 6.6 Policy
Rules for Cross Site protection with local Mirroring would require
3 for swap, 7 for VM home space, 14 for vmdks – 24
24 components X 200 VMs – 4,800 components
2Mbps for every 1000 is 4.8 X 2Mbps – 9.6Mbps
____
In this example PFTT-1 and SFTT-1
My calculation gives me at total of 7 SWAP components (is the article not taking SFTT into account?)
Component0 at SiteA – 1C
Component0 at SiteA – 1C – Mirror/SFTT-1
A witness component at SiteA – 1C
Component0 at SiteB – 1C
Component0 at SiteB – 1C – Mirror/SFTT-1
A witness component at SiteB – 1C
A witness at Witness site – 1C
Similarly I get 7 for VMHome (which is in accordance with the guide). Why is SWAP 3 in the guide?
I get the no. of component for a 500GB vDisk to be 14 –
SiteA – 500GB – Component0-255GB and Component1-245GB – 2 components
SiteA – 500GB – Component0-255GB and Component1-245GB – 2 components – Mirror/SFTT-1
SiteA – 1 Witness for component0 and 1 Witness for Component1. This is because of SFTT. – 2 components
SiteB – 500GB – Component0-255GB and Component1-245GB – 2 components
SiteB – 500GB – Component0-255GB and Component1-245GB – 2 components – Mirror/SFTT-1
SiteB – 1 Witness for component0 and 1 Witness for Component1. This is because of SFTT. – 2 components
Witness site – 1 for component0 and other for component1 – 2 components
Is my understanding correct?
OK – that is a lot of information to take on-board. Let me see if we can simplify it a little bit.
Let’s take your first example: “3 for swap, 3 for VM home space, 6 for vmdks – 12”. In this, we are stating 3 for swap and home since this is Stretched Cluster, so RAID-1 mirroring with a witness for swap and home, giving us 3 components, 1 x SiteA, 1xSiteB, 1 x WitnessSite. Thus this is PFTT=1, SFTT=0.
I don’t understand your next example which states “Either SiteA or SiteB will also have 2 additional Witnesses, one for component0 and the other for component1 – 2 components”. Where are you deriving these additional components from? If this is PFTT=1, SFTT=0, then there are no additional witnesses at either site. To the best of my knowledge, the only time we would have witnesses at SiteA or SiteB is when SFTT>0 and we are protecting VMs in the same site as well as across sites.
Are you using https://storagehub.vmware.com/t/vmware-vsan/vsan-stretched-cluster-guide/bandwidth-calculation-5/ as your reference?
Thanks for your reply. My bad, I missed the email notification.
I have taken these examples from vSAN Stretched Cluster Guide, Pg. no. 28.
Example-1
When PFTT=1 and SFTT=0, VMKD1=500GB (Compopent0=255GB and Component1=245GB)
The Witness component will only exist in the Witness site as local site policy isn’t in place. Which is why the first example gives us 6 components for VMDK1.
This I understand now.
Example-2
When PFTT=1 and SFTT=1 (RAID-1), VMKD1=500GB (Component0=255GB and Component1=245GB)
The guide gets the below nos. –
3 for swap, 7 for VM home space, 14 for vmdks = 24
The no. of components for VMDK1 is 14 [5 at SiteA(4 data and 1 Witness component, SFTT=1), 5 at SiteB(4 data and 1 Witness component, SFTT=1) and 4 at Witness site].
I understand the component bifurcation for VMDK1. But the example has just 3 component for VMSWAP, whereas 7 for VMHome.
Shouldn’t VMHome and VMSwap follow the same PFTT and SFTT, therefore have the same no. of components?
Example-3
When PFTT=1 and SFTT=1 (Erasure coding), VMKD1=500GB (Component0=255GB and Component1=245GB)
The guide gets the below nos. –
3 for swap, 9 for VM home space, 18 for vmdks = 30
For VMKD I also get 18 and below is my calculation –
4 for Component0 in SiteA – 3 data and 1 parity
4 for Component1 in SiteA – 3 data and 1 parity
4 for Component0 in SiteB – 3 data and 1 parity
4 for Component1 in SiteB – 3 data and 1 parity
2 Witness components for Component0 and Component1 at Witness site. Thus giving us a total of 18 components.
For VMHome I also get 9 and below is my calculation –
4 for Component0 in SiteA – 3 data and 1 parity
4 for Component0 in SiteB – 3 data and 1 parity
1 witness components at Witness site. Thus giving us a total of 9 components.
In this example too I do not understand why is VMSwap 3?
I downloaded the guide from here –
https://storagehub.vmware.com/t/vmware-vsan/vsan-stretched-cluster-guide/
And selected the “Export to PDF” option available at the top right.
I downloaded the guide from –
https://storagehub.vmware.com/t/vmware-vsan/vsan-stretched-cluster-guide/
And chose the “Export to PDF” option, available at the top left.
I see – so the issue is that the number of components reported by VMswap is incorrect. Swap should also use the same policy assigned to the VM, so this looks like an error in the calculation. It may be that the guide is using older calculations. In the past, swap was only ever FTT=1 (3 components) and did not inherit the VM policy. However that has changes in more recent vSAN versions, and now swap does indeed use the same policy as the rest of the objects that make up the VM. I’ll inform the document maintainers. Thanks for bringing this to our attention.
Hi Cormac, what is your recommendation for the following situation:
We have a 3 Nodes vSAN-Cluster with every node in a separate rack – configured as 3 fault domains.
Now we would like to bring a 4th node inside the cluster and must place it in one of the 3 existing Racks. So if the wrong Rack fails we loose 50% of our nodes and have a split brain situation. Regards Thomas
I don’t see any way of avoiding such a situation if you are only introducing a single node Thomas. Ideally, you would introduce a new node to each FD to maintain availability, but I guess you know this already.
Thx Cormac for your quick answer. I already thought that there is no solution for this situation. Is there any chance to work around this problem with a witness appliance? Regards Thomas
I guess you could implement a 2+2+1 vSAN stretched cluster where 2 data hosts for Fault Domain A are in one rack, 2 data hosts for Fault Domain B are in another rack , and then the witness appliance is deployed on another ESXi host which is in the third rack. There is a bit of work involved in doing something like this, and I’m not sure if I would be comfortable converting a standard vSAN already in production to a stretched vSAN. I think further research would be needed here to see if it even feasible.