A quick reference to vSAN (previously known as Virtual SAN) posts



  • A link to the Essential VSAN book (second edition) that I co-authored with Duncan Epping.
  • A link to the Essential VSAN book (first edition)




  1. Can you give us some detail on calculating Disk Yield? If I have 3 modes with 1tb, will I see 3 tb storage? does a VM that uses 50gb of storage take up 50gb, or 100 gb, or 150 gb?

    • There should be a sizing guide going live shortly, but all magnetic disks across all the hosts will contribute to the size of the VSAN datastore. The SSDs (or flash devices) do not contribute to capacity. So if you had 1TB of magnetic disk in 3 nodes, your VSAN datastore will be 3TB.

      The amount of disk consumed by your VM is based primarily on the failures to tolerate (FTT) setting in the VM Storage Policy. An FTT of 1 implies 2 replicas/mirrors of the VMDK. Therefore a 50GB VMDK created on a VM with an FTT=1 will consume 100GB. A 50GB VMDK created on a VM with an FTT=2 will make 3 replicas/mirrors and therefore consumes 150GB. Hope that makes sense. Lots of documentation coming around this.

  2. Hi Comac,

    Need to understand on the “Note” of VSAN Part 9 topic:

    On the vSphere HA interop:

    ….”Note however that if VSAN hosts also have access to shared storage, either VMFS or NFS, then these datastores may still be used for vSphere HA heartbeats”

    If for example all the VSAN hosts also have VMFS shared datastore(s) (say using FC SAN), then I can have TWO kind of HA protections which are if the VM located on the VSAN datastore then it gets VSAN HA protection and if the VM located on the shared VMFS datastore then it gets a traditional HA protection?


  3. Just to clarify on the whole disk consumption based on the FTT setting…going back to your example of a FTT=1 for a 50GB VM….

    Are you saying that it will consume an additional 100GB of space due to the 2 replicas created?…or are you saying that the original VM (VMDK) that is created is counted as one of those replicas?

    “therefore a 50GB VMDK created on a VM with an FTT=1 will consume 100GB”

    In regards to being completely clear, would it be better to say

    will consume an extra 100GB in addition to the 50GB VM (VMDK)”?

    I’ve done countless days and days of researching for the past ~6 months or so but every time I hear that, it throws me off on my understanding of FTT > disk consumption.

    Thank you in advance for your time, if you choose to respond.

    *I read your book BTW, you and Duncan Epping are rockstars in the world of virtualization….really good read. Couldn’t have asked for more.


    • It means that 2 x 50GB replicas are created for that VMDK James, meaning 100GB in total is consumed on the VSAN datastore (not an additional 100GB). Note however that VMDK are created as thin provisioned on the VSAN datastore, so it won’t consume all of that space immediately, but over time.

      Thanks for the kind words on the book – always nice to hear feedback like that.

      • Thanks for the reply and clarification…so to make sure I get this right, there will be a single VMDK for the actual VM running in the environment BUT since VSAN is in use, if your FTT=1, then 100GB will be consumed by the 2 replicas that are created (over time with thin provisioning).

        I think my confusion is in the semantics of how every everyone explains it.

        • Yep – you got it. A single 50GB VMDK, made up of two 50GB mirrors/replicas, each replica sitting on a different disks (and host) but the same datastore and eventually consuming 100GB in total on the VSAN datastore

  4. I have a question for you regarding Part 13 in which you refer to “the VM swap file” and the “swap object”. How does the vmx-*.vswp file fit into all this? This file was introduced in 5.0. Does this file belong in the swap object? Is there a second swap object for it? Or does it simply belong to the VM namespace object?

    • Yes – this is what we are referring to. This is now instantiated as its own object on the VSAN datastore, and does not consumes space in the VM namespace object.

  5. Hi Cormac,
    A question about the “Virtual SAN 6.0 Design and Sizing Guide”. On page 46 it states ‘For hybrid configurations, this setting defines how much read flash capacity should be reserved for a storage object. It is specified as a percentage of the logical size of the virtual machine disk object.’. So a percentage of the logical size (used storage). The example on page 47 takes the flash read cache reservation as a percentage of the physical space (allocated storage). What is the truth?


    • These statements are meant to reflect the same thing Stevin. When I say that it is a “percentage of the logical size”, this is not the same as “used storage”.

      All VMDKs on VSAN are thin by default. they can be pre-allocated (made thick) through the use of the Object Space Reservation capability-.

      However, whether you use that or not, you request a VMDK size during provisioning, e.g. 40GB. Now you may only use a portion of this, e.g. 20GB, as it is thin provisioned.

      But Read Cache is based on a % of the requested size (logical size/allocate storage), so 40GB. Hopefully that makes sense.


  6. Hi Cormac,
    regarding your book Essential VSAN, excellent book btw. The book states: In the initial version of VSAN, there is no proportional share mechanism for this resource when multiple VMs are consuming read cache, so every VM consuming read cache will share it equally. how must i read this? will the total flash read cache size be devided by the number of VMs consuming VSAN storage and that is the amount of flash read cache each VM gets? (this would be a problem for read intensive VMs with more storage than average)
    What about the write cache? Every write has to go through the write cache i presume? How is write cache shared between VMs?

    thanks again.

    • Hi Cormac

      I would be very interested to know about read and write cache allocation to VM’s when reservation is set to 0 for VSAN 6.2

      If I copy a large file from :C to D: drive in my windows VM, I see very poor transfer rates by comparison to the same copy on a PC (less than half the speed). The transfer rate drops to zero for up to 7 seconds for periods during the transfer. Its almost like its cache allocation has filled up and its waiting for destage to complete.


      • Hi Karl,

        extremely difficult to figure this out without getting logs, etc. I would recommend opening a call with support.
        However there were some significant bug fixes in the most recent patch – VMware ESXi 6.0, Patch Release ESXi600-201611001 (VMware ESXi 6.0 Patch 04). Are you running this?

  7. Hi Cormac
    A question about the ratio for SSD and HDD numbers. what’s the best number for the ratio? From the system level view, If only one HDD I belived the performance will not good( your data will gating on one HDD interface). but if the HDD disk is around 10, The SSD couldn’t provide enough cache to all. Just wonder if there’s a perfect ratio?

    • It is completely dependent on the VMs that you deploy. If you have very I/O intensive VMs each with large working sets (data is in a state of change), then you will need a large SSD:HDD ratio. If you have very low I/O VMs with quite small working sets, you can get away with a smaller SSD:HDD capacity. Since it is difficult to state which is the best for every customer, we have used a 10% rule-of-thumb to cover most virtualized application workloads.

      • Appreciate Cormac, understand the ratio will determine the performance. And user configure it with their application case. it provide flexibly to users.
        I may didn’t make it clear.
        The ratio here I mentioned is physical device number not capacity number.
        Or the Performance has no relationship with physical devices number ratio, only affected by SSD:HDD capacity ratio?

        • This is one of those depends answers Lyne.

          If all of your writes are hitting the cache layer, and all of your reads are also satisfied by the cache layer, and destaging from flash to disk is working well, then 1:1 ratio will work just fine.

          If however you have read cache misses that need to be serviced from HDD, or there is a large amount of writes in flash that need to be regularly destaged from flash to HDD, then you will find that a larger ratio, and the use of striping across multiple HDDs for your virtual machine objects can give better performance.

  8. Yes, Cormac.
    That’s my concern. we’re struggling with 1SSD:4 HDD and 1SSD:5HDD performance difference.
    I think if we have big SSD, it should have less possibility to miss cache.
    Even the cache missed, the 4HDD and 5HDD shouldn’t affect big right?
    Maybe need to set up environment and collect some test data. 🙂

  9. hi Cormac
    I run VSAN with 3 Host’s
    and also config LACP between server and switch
    SSD Samsung pro 850 with 512GB of size
    after run I tested copy speed between 2 VM that run in VSAN and the speed is between 20MB to 60 MB
    What is my problem
    Notice in Smart Storage Administrative I disabled Caching from SAS and SSD disk, then test speed and speed was very bad
    again delete arrays and create with cached enable and again speed was very bad
    Please help

    • Hi Morteza,

      Noticed you’re using the same Samsung consumer grade SSDs that I thought would work. Are you using them as the caching tier or as the capacity drives? In my case, I used them as the caching tier and had all sorts of issues, even down to Permanent Disk Loss errors randomly appearing, requiring a host reboot. I’ve since moved them to the capacity tier and put in some Enterprise SSDs and so far, haven’t had any further issues.



  10. I posted this on the VM/Host affinity groups, but didn’t get a reply. I’m looking at setting up a VSAN stretched cluster. Can you help answer this?


    How does VM/Host affinity groups work with fault domains? I’m looking at setting up a VSAN, and setting the fault domain for site A to be site B. As I understand, by doing that, when I set FTT=1, the data will be replicated to site B instead of to another node at site A. This is to cover the case where we lose the entire rack at Site A. The VMs will be able to reboot at Site B off of the replicated data at Site B.

    If I were to use VM/Host affinity groups, then wouldn’t I need to replicate to a second node at site A? Would that mean setting FTT=2, and it would replicate to a node at site A, and a node at site B? Maybe VM/Host affinity groups don’t work when using fault domains. Can you help me sort that out?

    • First, VSAN Stretched Cluster only supports FTT=1. Fault Domains and FTT work together.

      If you have a failure on site A, the VM/Host Affinity rules will attempt to restart the VM on the same site, i.e. site A.

      If you have a complete site failure (e.g. lost power on site A), the VM/Host affinity rules will then attempt to restart the VM on the remote site, i.e. site B.

      You still need to use fault Domain with Stretched Cluster, but simply as a way of grouping hosts on each site together.

      This should be well documented in the stretched cluster guide. There is also a PoC guide due to be released very soon which will provide you with further detail.

      • Thanks for your reply.

        So if a stretched cluster has FTT=1, then doesn’t that mean it will only replicate data to another node at site B? If it only replicates to another node at site B, and a node at site A goes down, how will VM/HA rules be able to restart the VM on the same site A?

      • Hi,

        Can you say if removing this limitation of Stretched Cluster (FTT only =1) is on the roadmap? We are looking at implementing it but would like to have 2 copies on the primary site + 1 on the secondary (or maybe 2+2 active-active configuration)

        Thanks, Vjeran

  11. About the Health Check plugin, any thoughts on why it triggers the alarm ‘Site Latency Health’ between host and witness on as low as 15ms when less or equal to 100ms is the recommended figure? is there any way to tweak this?

  12. Hi Cormac, I read your book BTW, you and Duncan Epping are really good in the world of virtualization….really good read. you have expertise in virtualization Couldn’t have asked for more

  13. Hi Cormac,

    I found that some posts have a reply box, but some do not have.
    I have read the post about “VSAN 6.2 Part 1 – Deduplication and Compression” and wanted to leave reply there, but it seems that there is no space…

    How can I leave a reply to that post?

      • OK, got it.

        So I post my questions about “Deduplication and Compression” here as the last option.

        My environment is as following:
        1. 3 All Flash ESXi hosts with Dedep and Compression enabled.
        2. Only PSC, VCSA and other 2 VMs have been deployed with less then 1TB totally.
        3. The object space reservation is 0% with default VSAN Storage policy.

        But from what i saw is that:
        1. The deduplication and compression overhead is 6.5TB.
        2. The ‘used-total’ grows up to about 2 TB after enable Dedup and compression.

        Is that the normal phenomenon after enable the feature? BTW, is there any formula that i can use to calculate the expected consumed capacity?


  14. Hi Cormac,
    vSAN is leveraging the new vsanSparse snapshot technology. Does this new snapshot technology also reduce the stunning time during removal of a large snapshot compared to traditional “redo log” snapshots? I didn’t find any comments in the vsan snapshot performance white paper about this.

    • I think the main difference is the in-memory cache and the granularity that vSANsparse uses – otherwise the techniques are quite similar. However I am not aware of any study to measure the differences. This might have further useful info -https://storagehub.vmware.com/#!/vmware-vsan/vsansparse-tech-note

  15. Hi Cormac,

    I wanted to run a vSAN maintenance scenario by you to see if there are any potential drawbacks, aside from a node failing while performing the maintenance. This is regarding ‘Ensure availability’ and ‘No data migration’ maintenance modes.


    A single node in a 4 node vSAN cluster is placed into maintenance mode using the ‘No data migration’ method. Once in maintenance mode, software updates/firmware is applied to the node and it’s unavailable for roughly 30-40 minutes. After the maintenance is completed the node is placed back into production and the administrator immediately moves onto the next node in the cluster to be patched. The admin again uses the same ‘No data migration’ maintenance mode on this node, applies updates for 30-40 minutes and so on. These steps are repeated for the remaining nodes.

    Cluster details
    vSAN version: 6.2
    Hosts in Cluster: 4
    Storage Policy on all VMs: FTT=1
    Fault Domains: Single FD per host
    Disk Configuration: Hybrid


    If the admin is performing maintenance this way without waiting for components to re-sync after each 30-40 minute window and is not using ‘Ensure availability’, would there be potential data issues or a chance of VMs becoming unavailable as a result? This is again without a node failing in the cluster during these maintenance windows. I understand this is not the preferred way of doing maintenance, but I was just curious what could happen and if there were any fail-safes when this occurs.

    • You definitely need to be careful with this approach. First, you might like to increase the cmmds repair delay timeout value above the 60 minute value (see KB 2075456). This gives you a bit more lee-way, in case it take a bit longer to apply the fw and reboot the host. It will mean that rebuild won’t start if the maintenance runs over 1 hour..

      Now there may well be some changes that need to be synced once the host has rebooted. You need to wait for this to complete before starting maintenance on the next host. I like to use RVC commands for this such as rvc.resync_dashboard (you can also use the UI). Only commence work on the next host when you are sure that all objects are fully synced and active.



Leave a Reply