How to contact Cormac Hogan:
my name is Weithenn (vExpert 2012 ~ 2014). Very pleased to be able to translate your writings with Duncan. Now, Traditional Chinese version already on the market in Taiwan.
Available via Tenlong Boot Store (http://www.tenlong.com.tw/items/9863473413?item_id=999390)
Available via GOTOP publishing house (http://books.gotop.com.tw/v_ACA020200)
Wonderful news – thanks for your efforts Weithenn
Is this article still relevant Cormac?
Great blog btw!!
Yes – we don’t have any control over the drivers that OEMs pace on the images, so we don’t know if they have been certified for VSAN or not. Use the health check plugin to verify that everything is good from a HCL perspective after VSAN has been deployed with the OEM images. However, you may need to install different drivers if there is a mismatch.
RAID controller testing for VSAN HCL
I’ve also opened an SR but haven’t had a lot of success in determining who is responsible for this process. Do you have any suggestions?
I wanted to reply for this article of yours: http://cormachogan.com/2015/05/07/vsphere-6-0-storage-features-part-8-vaai-unmap-changes/
but i believe it is too late now to post there, so posting here.
You have mentioned that UNMAP is supported from windows 2008 and above. But from the windows blog, https://msdn.microsoft.com/en-us/library/windows/hardware/dn265487%28v=vs.85%29.aspx, and from our testing in lab, we see that windows supports understands thin provisioning only from windows 2012. Please correct me if i am wrong.
I understood that this functionality was also in Windows 2008, but it appears I was mistaken. Thanks for highlighting. I’ll adjust the post.
In your article “VSAN considerations when booting from USB/SD”, you mentioned that VSAN traces need at least 500 MB of space. However, official VMware document (https://www.vmware.com/files/pdf/products/vsan/VMware-TMD-Virtual-SAN-Hardware-Guidance.pdf) explicitly note that we are not supposed to increased the default size of 300 MB.
The reason I ask this is because I constantly have hosts that giving out “ramdisk ‘vsantraces’ is full” error. I’m thinking to increased its size, but got hesitated.
Thank you very much for your response.
Is this booting from SD or USB Aditya?
I’m not sure there is a way of modifying the layout on these devices, so please check with GSS to be sure. The issue is that these devices do not get a persistent scratch area.
When booting from a SATADOM, these “appear” like disk devices so have the 4GB scratch partition.
We are looking at improvements in this area. Let me see if I can find more info.
Cormac, apologies for jumping on this thread… but could you raise the question internally, as to how after I follow the bootstrapping VSAN VMware document, I can then migrate the standard switches created during this process to distributed switches when the vcsa / psc are sat on the vsan datastore that the bootstrapping process creates?. The document says it can be done.. we are now stuck in the middle of an NHS project to get vsan / horizon in place (using vDS) however after following the document we are stuck with std switches..
Trying to find someone who can validate it Paul. Unfortunately the author of the paper is no longer with VMware.
Hi Cormac, many thanks for taking the time to check. I understand you must be busy and this isn’t your document. So your help is much appreciated.
I’m just toying with the idea of using 1 of the 4 hosts to create a local VMFS to deploy psca and vcsa to. Then use this vcsa to deploy a vsan using the remaining 3 nodes and then attempt to storage vmotion the psc and vcsa to the 3 host vsan datastore I created.. then once that is done, blow away the disk config on the host I used to create the temp VMFS and ultimately put this host in to the 3 host VSAN cluster to give it the 4th host..
Have you storage vmotioned a vcsa over to a VSAN in this way in your lab/test?.
I haven’t done that Paul, but if all the networking is in place, it should theoretically work. Sorry I can’t be of more help. I’m on the road so I can’t try this out for you either.
Hey Cormac, the svmotion migration of the VCSA & PSC to the VSAN datasore all worked fine as all networking was in place. I’d be interested to see if a solution to the migration to distributed switches from standard can be achieved after the bootstrapping of VSAN has taken place.. because logically that would be the next step for folks like us who are in a greenfield environment.. staying at standard switch config would have made the NIOC impossible so it was quite a show stopper.
Anyhow the migration to the VSAN datastore all worked out well, thanks for your communications Cormac,
The hosts are booting from USB drive.
I found another article on the web referring to your article that explain how to increase this ‘vsantraces’ partition.
But I’ll open a ticket to GSS, just to be sure.
Thank you very much Cormac.
are you also into VSAN managed by the ruby console with newly 6.2 features? Found: vsan.sizing. on the rvc but no documentation on that anywhere…
Let me look …
This vsan.sizing. is “future-proofing”. This RVC extension has not yet been implemented, but we “may” implement something around it going forward.
Do you have any insight on why my ESXi hosts (2 out of 3-node cluster) would be running the vsanTraceReader process at 100%?
Result of ‘zcat vsantraces–2016-05-03T03h27m38s818.gz | /usr/lib/vmware/vsan/bin/vsanTraceReader.py > vsantracerlog.txt’ is here.
Afraid not – please speak to support Steve.
Hello, i love your blog and try to understand the write process of vsan. But after reading a lot of things i still have some issue about the write path. I understand that vsan is all about object and than an object have a maximal size of 250Gb. So my question is that an object is tight to a specific disk group? Or an object will be span to multiple disk group ?
Same i understand that the write will be aggregate in the ssd (write cache) of a disk group, so the question is the 1mb chunck will be destage on all disk or only on the disk of the disk group?
I try to understand at the end if my VM will be write to 1 disk group and replicate to another one, or if at the end the chunck will be put everywhere on the cluster.
A1. No, an object is not tied to a specific disk group. It can be made up of multiple components, which can span disk groups, and the components themselves can span different disk/disk groups depending on size.
A2. 1MB chunks are per disk device.
A3. VSAN may decide to split up a component across multiple disks and disk groups, depending on size and available space. However the destage process is local to a disk group, as the cache device only caches blocks for its own disk group.
i’ve had a problem using Vvol in 3par 7200c storage, let me explain: I use vmware vcenter solution to create virtual machines in a HP blade system. Suddenly one of the virtual machines lost 2 of which had 4 HD’s. Vcenter only shows that there are 2 .vmdk, but when I access 3par CLI and use the comand showvvolvm I can see that actually there are 4 .vmdk. I want to recover these 2 lost .vmdk, can you help?
I’d recommend opening an SR with both VMware GSS and HP support to recover from this situation John. Let me know how it goes.
Let me first tell you how awesome your blog is. A lot of great posts.
I would like to know if there is a way (even not supported) to do RAID 1 with two disks groups on a single node vSAN ?
It would allow me to lose a disk group and I would stiil be able to run my VM in the second disk group.
It’s only for lab and testing purpose.
Thanks a lot for your answer.
I’ve been reading on your post regarding VSAN 6.2 Part 10 – Problematic Disk Handling.. As per your post, “If VSAN detects excessive write IO latency to a capacity tier disk, the disk will be unmounted. By default, VSAN 6.2 will no longer unmount any disk/diskgroup due to excessive read IO latency, nor will it unmount a caching tier SSD due to excessive write IO latency.”
However this seems to be not the case for us last week. our DB VM on VSAN 6.2 had a production downtime due to VSAN unmounting a disk (VSAN marked the disk as absent), Sad to say that VMware GSS was unable to assist us properly nor provide an RCA. They said that it was a case of very high I/O latency, which we already knew.
Anyway, I did search around and found your post, which made sense on what happened. in addition, it seems to also be a behavior of Linux systems (marking the LUN as read only). Although based on what you said, VSAN by default shouldn’t have unmounted the disk. What would be your view on this?
I am just trying to find the answer for one of the issue, which I faced recently.
We initially built a vSAN cluster with 5 hosts and 5 capacity disks for each disk group.Each host had 1 disk group. and we enabled compression and deduplication in the vSAN cluster. Later we decided to add 2 more hosts to the vSAN cluster and also add 2 more capacity disks to each disk group. I was able to add 2 new disk groups to the vSAN cluster and when I tried to add 2 more capacity disks to each disk group, I got an configuration error stating that compression and deduplication should be disabled before adding disks to each disk group. Once I disabled compression and deduplication in the cluster, which took almost 30 hours, then I was able to add capacity disks to each disk group. after adding the capacity disks, I re-enabled compression and deduplication and that took another 30 hours. I am wondering, if that restriction will be changed in vSAN 6.5?
No change in 6.5. The reason for this behaviour is outlined here: http://cormachogan.com/2016/02/12/vsan-6-2-part-1-deduplication-and-compression/ in the “Other considerations” section.
I have question regarding VMFS 5 and 6 in Vsphere 6.5
It concerns me as a lot of new functionalities (oh you mighty UNMAP :)) are introduced within VMFS6 and i`m trying to figure it out how it could be enabled for “upgraded” datastores. Im more than sure for the VMFS datastore its just a parameter, which allows certain hosts to perform “clawrer” operation – am I correct?
Also could you elaborate a little bit about intra-VM sprace reclamation? In prior to 6.0 version there were a lot of requirements which had to be implemented in order to VM level reclamation work… nothing of those exists on Vspshere 6.5
IS it really that simple? Will it be enabled by default for VMs migrated from VMFS5 datastores or older infrastructure version (ie. <6.0)
We will have a 6.5 Core Storage WP out soon. It should answer many of your questions. I’ll announce it here as soon as it is available.
I have a question about the Docker Volume Driver for vSphere. We are currently evaluating it but there is one thing I can’t get to work that feels like it should.
The lab setup is the following (no vcenter here):
-SharedDS1 <= Docker volume Driver Photon1 => Container
-SharedDS1 <= Docker volume Driver Photon2 => Container
-Photon1 and 2 in the same Swarm.
The basics of the setup work fine:
– I create a docker volume with vsphere driver on photon1 for example and it pops up in “docker volume ls” on photon2 <- OK
– I deploy a container on photon1 from a yml file to use the previously created vsphere volume <- OK
– I add a file to the container to test the persistance, kill the container, it comes back with the file still here <- OK
Now I drain stop Photon1:
– The container starts on Photon2 but the data is back to default, no file <- PROBLEM
Could you point me in the direction of what I might be missing?
Please let me know if you think this matches your issue – http://vmware.github.io/docker-volume-vsphere/documentation/known-issues.html
Otherwise, I strongly recommend filing an issue directly on github for vDVS
Thanks for the quick reply.
Turns out the symlink to the db file stored on a shared datastore doesn’t persist after a reboot of ESXi.
I posted a reply in github : https://github.com/vmware/docker-volume-vsphere/issues/1032
Regarding example #4 in your article: http://cormachogan.com/2015/01/15/vsan-part-34-how-many-disks-are-needed-for-stripe-width/
If host 2 dies, both stripes die. That makes sense. However, copies of all data remains available collectively from host 1 and 3 (a, b, c, d). Is it possible vSAN would handle this in the future to abstract this further to keep data online and make increasing stripe width more flexible and resilient?
Also are you aware of any similar writing that covers stripe width and erasure coding? I imagine one could paint oneself into a corner if not careful (ex. 6-node cluster, erasure coding, FTT=2, SW=12), and unable to evacuate a host for prolonged maintenance. Or is the assumption that stripe width is performance-driven, and in that case mirroring would be used instead of erasure coding?
Hi Russell, sorry for the late response. I was on vacation.
I’m not sure what the ask is in part 1 of the question.
For part 2, there is indeed a consideration when it comes to using stripe width with erasure coding, just as there is for mirroring. However it is more to do with number of components and distribution of these components. This post shows some of the component numbers that you might end up with when each segment of a RAID-5/6 has a stripe width associated with it – http://cormachogan.com/2017/03/23/sizing-large-vmdks-vsan/ (note that the layout show here is due to size of vmdk and not stripe width, but the outcome is the same)
I stumbled upon your Photon post when searching for this peculiar issue I’m facing in my lab testing vSAN 6.6.1:
I was just setting up the vSAN from scratch (not Photon, just generic brand new vSAN cluster setup with the latest 6.5U1 ESXi host install with the latest vCenter 6.5U1 install on a separate host – all with the latest updates and patches as of today)
I ended up getting the same “vSAN cluster configuration consistency” warning that also listed the issue being each of the 6 hosts showing “invalid request (Correct version of vSAN Health installed?)”
I noticed the following behavior:
– initial setup of the cluster with vSAN without disks claimed had them in multicast mode without any network partition problem (tested multicast traffic was working with tcpdump-uw)
– upon claiming disks, it tried to go into unicast mode and then every single host became partitioned. Unicastagent list came up empty. Cluster complains of being partitioned.
– manually adding the unicastagent entries resolved the partition problem, and that’s how I ended up with the “vSAN cluster configuration consistency”
Wondering if you have better insight to this issue. Thanks!
Hi Guybrush – I have not seen this behaviour in my 6.6.1 setup. Hosts should automatically switch to unicast without needing you to manually adding unicast agent entries. I’d strongly recommend engaging technical support to see why you are getting this behaviour.
Hello Cormac, I´ve read this article https://cormachogan.com/2013/07/08/automating-the-iops-setting-in-the-round-robin-psp/ I’m not a storage expert… regarding the IOPS does it have to be 1000 or 1 at all? Can it be a value in the middle? If not, why? And if yes, what is criteria to adopt? If you could even point me to a document that explains the answer I’m looking for it’d be great! Many thanks
Best advice is to speak to the storage array vendor and ask for their recommendation. I don’t believe we (VMware) make any recommendation around this setting.
ok thanks, I have no need however, I was just trying to know more around this configured value
I am interested in purchasing the book Essential Virtual SAN (2nd edition). Considering it was written over a year ago, is it still worth purchasing?
Well, I’m biased as you might imagine, but i still think you would get a lot out of it. The fundamentals have not changed much, but of course there are additions and enhancements to vSAN since the book was launched that won’t be captured.
Hello Mr. Cormac, I’m not sure if I could ask here, but I need a clarification about an old White paper you posted on vmware site.
If you take a look at figure 5 at page 10 you wrote that if I have 2 separate switches I can use IP hash load balancing technique to achieve faster connection and load balancing goals.
But do you mean that in the virtual switch I enable IP Hash and put in there all the 4 NIC? How can be handled this configuration if on the phisical switches there are 2 different etherchannel?
Which kind of results in term of NIC utilization should I expect?
Thank you in advance.
I suspect this reference to IP Hash is outside the scope of etherchannel. If you had etherchannel, you could create LAGs which would do a better job of load balancing. But if you did not have etherchannel, and the NAS array presented datastores on multiple ports, you could try IP Hash and see if you could balance the load across multiple uplinks. Of course, this is very hit-or-miss, and you may not get any load-balancing even after setting up IP-Hash. BTW, there is an updated version of this paper available here – https://storagehub.vmware.com/#!/vsphere-storage/best-practices-for-running-vmware-vsphere-on-network-attached-storage
In the “What’s New course 6.5, they speak about VMFS6 and the shared resource pool locks. It is not really explained what it does at the background. Can you please give some explanation?
Hi there – I talked about enhancements to VMFS6 in my VMworld session at VMworld 2017 – you can watch it here: https://cormachogan.com/2017/08/30/vmworld-2017-session-vsphere-6-5-core-storage-now-youtube/
I’ve read your blogs about SanDisk’s FlashSoft solution and I was wondering if you might be able to help me out.. I’m trying to delete the FlashSoft 4.1 host components off of some ESXi hosts, but when I click on Uninstall from the FlashSoft interface on my ESXi Cluster’s configure -> FlashSoft tab, it just says “Remove the configured storage policy from all the VM templates, virtual machine(s) (including its virtual disks and VM Home) in this cluster to uninstall host component.” The problem is I have no Storage policies configured whatsoever for disks, VMs, hosts, etc. so there’s nothing (that I know of) to remove! Was hoping you’d be able to tell me what I’m missing in order to get the host components uninstalled from my cluster. Thanks!
I’m afraid not SAL. I think the FlashSoft team have a new startup called JetStream Software. You might be able to reach out to them via twitter or some other method, and see if they can offer some advice.
Hi Cormac, hoping you can help me understand a couple concepts regarding component placement. I studied this link and MANY others: https://cormachogan.com/2015/01/15/vsan-part-34-how-many-disks-are-needed-for-stripe-width/
1. Considering the automatic 255GB Large Object Limit “chunking process” relative to a 900GB VMDK:
-Do “chunk/stripes” created count toward the Policy-based “Number of stripes per object” setting?
-If using Default Policy of 1, does that make the “effective” stripe width 4?
-If not, is it correct to call them “chunks” as opposed to “stripes”?
-Is the only difference between between the two that one can be placed on multiple drives while the other cannot?
-Does vSAN create the number of “chunk” stripes needed for any object larger than 255GB at the time of instantiation or as needed?
-Does vSAN always place “chunks” on different drives/DGs up to the point that the FTT policy cannot be met?
2. Considering “Number of disk stripes per object” policy setting for a 6.6+AF stretched cluster with 48 capacity drives:
-Is it correct to assume that when this setting is part of a policy that sets PFTT=0 and Locality, that replica and stripe placement is affected in kind, leaving 24 drives to work with in one site?
-Is there a formula do derive the maximum numbers of stripes we can use in one site and still meet SFTT=1/RAID-1? Or, what is the minimum number of drives to satisfy a 12 stripe width? (I re-read your article but I think I’m still missing something in the maths.)
Mike, answers below.
1.1 No, chunks do not count towards the stripe width.
1.2 No, Stripe width is still effectively 1. The chunks can be thought of as concatenations, not stripes (if you are fmailiar with RAID terminology)
1.3 Yes. That is how I refer to them to differentiate them from stripes.
1.4 No. There is another difference. We do not stripe data across chunks like we do with a real stripe. We fill chunks before moving to next chunk.
1.5. No. You can observe this when you create your 900GB VMDK. If it has not space reservation, it will be deployed as a single component, and new components are added as it grows. Objects with a stripe width as instantiated as multiple components immediately.
1.6 No. Chunks can be placed anywhere so they could all end up on the same disk.
2.1 No. Currently a RAID-6 with PFTT=0 can be deployed across both sites, unless you specify site affinity in the policy. If you set site affinity, then what you describe is true.
2.2 Well, it will be 24 physical drives for the mirrored stripe, but I think the witness requirement is dependent on the number of hosts, disk groups and disks. I don’t know how the algorithm works at this scale, but it might be that it can get away with extra votes on some components and not need a witness, as described in this case – https://cormachogan.com/2015/03/13/vsan-6-0-part-1-new-quorum-mechanism/
This site uses Akismet to reduce spam. Learn how your comment data is processed.