my name is Weithenn (vExpert 2012 ~ 2014). Very pleased to be able to translate your writings with Duncan. Now, Traditional Chinese version already on the market in Taiwan.
Yes – we don’t have any control over the drivers that OEMs pace on the images, so we don’t know if they have been certified for VSAN or not. Use the health check plugin to verify that everything is good from a HCL perspective after VSAN has been deployed with the OEM images. However, you may need to install different drivers if there is a mismatch.
In your article “VSAN considerations when booting from USB/SD”, you mentioned that VSAN traces need at least 500 MB of space. However, official VMware document (https://www.vmware.com/files/pdf/products/vsan/VMware-TMD-Virtual-SAN-Hardware-Guidance.pdf) explicitly note that we are not supposed to increased the default size of 300 MB.
The reason I ask this is because I constantly have hosts that giving out “ramdisk ‘vsantraces’ is full” error. I’m thinking to increased its size, but got hesitated.
Thank you very much for your response.
I’m not sure there is a way of modifying the layout on these devices, so please check with GSS to be sure. The issue is that these devices do not get a persistent scratch area.
When booting from a SATADOM, these “appear” like disk devices so have the 4GB scratch partition.
We are looking at improvements in this area. Let me see if I can find more info.
Cormac, apologies for jumping on this thread… but could you raise the question internally, as to how after I follow the bootstrapping VSAN VMware document, I can then migrate the standard switches created during this process to distributed switches when the vcsa / psc are sat on the vsan datastore that the bootstrapping process creates?. The document says it can be done.. we are now stuck in the middle of an NHS project to get vsan / horizon in place (using vDS) however after following the document we are stuck with std switches..
Hi Cormac, many thanks for taking the time to check. I understand you must be busy and this isn’t your document. So your help is much appreciated.
I’m just toying with the idea of using 1 of the 4 hosts to create a local VMFS to deploy psca and vcsa to. Then use this vcsa to deploy a vsan using the remaining 3 nodes and then attempt to storage vmotion the psc and vcsa to the 3 host vsan datastore I created.. then once that is done, blow away the disk config on the host I used to create the temp VMFS and ultimately put this host in to the 3 host VSAN cluster to give it the 4th host..
Have you storage vmotioned a vcsa over to a VSAN in this way in your lab/test?.
Thanks again
Paul.
I haven’t done that Paul, but if all the networking is in place, it should theoretically work. Sorry I can’t be of more help. I’m on the road so I can’t try this out for you either.
Hey Cormac, the svmotion migration of the VCSA & PSC to the VSAN datasore all worked fine as all networking was in place. I’d be interested to see if a solution to the migration to distributed switches from standard can be achieved after the bootstrapping of VSAN has taken place.. because logically that would be the next step for folks like us who are in a greenfield environment.. staying at standard switch config would have made the NIOC impossible so it was quite a show stopper.
Anyhow the migration to the VSAN datastore all worked out well, thanks for your communications Cormac,
The hosts are booting from USB drive.
I found another article on the web referring to your article that explain how to increase this ‘vsantraces’ partition.
But I’ll open a ticket to GSS, just to be sure.
Thank you very much Cormac.
Hello, i love your blog and try to understand the write process of vsan. But after reading a lot of things i still have some issue about the write path. I understand that vsan is all about object and than an object have a maximal size of 250Gb. So my question is that an object is tight to a specific disk group? Or an object will be span to multiple disk group ?
Same i understand that the write will be aggregate in the ssd (write cache) of a disk group, so the question is the 1mb chunck will be destage on all disk or only on the disk of the disk group?
I try to understand at the end if my VM will be write to 1 disk group and replicate to another one, or if at the end the chunck will be put everywhere on the cluster.
A1. No, an object is not tied to a specific disk group. It can be made up of multiple components, which can span disk groups, and the components themselves can span different disk/disk groups depending on size.
A2. 1MB chunks are per disk device.
A3. VSAN may decide to split up a component across multiple disks and disk groups, depending on size and available space. However the destage process is local to a disk group, as the cache device only caches blocks for its own disk group.
i’ve had a problem using Vvol in 3par 7200c storage, let me explain: I use vmware vcenter solution to create virtual machines in a HP blade system. Suddenly one of the virtual machines lost 2 of which had 4 HD’s. Vcenter only shows that there are 2 .vmdk, but when I access 3par CLI and use the comand showvvolvm I can see that actually there are 4 .vmdk. I want to recover these 2 lost .vmdk, can you help?
I’ve been reading on your post regarding VSAN 6.2 Part 10 – Problematic Disk Handling.. As per your post, “If VSAN detects excessive write IO latency to a capacity tier disk, the disk will be unmounted. By default, VSAN 6.2 will no longer unmount any disk/diskgroup due to excessive read IO latency, nor will it unmount a caching tier SSD due to excessive write IO latency.”
However this seems to be not the case for us last week. our DB VM on VSAN 6.2 had a production downtime due to VSAN unmounting a disk (VSAN marked the disk as absent), Sad to say that VMware GSS was unable to assist us properly nor provide an RCA. They said that it was a case of very high I/O latency, which we already knew.
Anyway, I did search around and found your post, which made sense on what happened. in addition, it seems to also be a behavior of Linux systems (marking the LUN as read only). Although based on what you said, VSAN by default shouldn’t have unmounted the disk. What would be your view on this?
Hi Cormac!
I am just trying to find the answer for one of the issue, which I faced recently.
We initially built a vSAN cluster with 5 hosts and 5 capacity disks for each disk group.Each host had 1 disk group. and we enabled compression and deduplication in the vSAN cluster. Later we decided to add 2 more hosts to the vSAN cluster and also add 2 more capacity disks to each disk group. I was able to add 2 new disk groups to the vSAN cluster and when I tried to add 2 more capacity disks to each disk group, I got an configuration error stating that compression and deduplication should be disabled before adding disks to each disk group. Once I disabled compression and deduplication in the cluster, which took almost 30 hours, then I was able to add capacity disks to each disk group. after adding the capacity disks, I re-enabled compression and deduplication and that took another 30 hours. I am wondering, if that restriction will be changed in vSAN 6.5?
I have question regarding VMFS 5 and 6 in Vsphere 6.5
It concerns me as a lot of new functionalities (oh you mighty UNMAP :)) are introduced within VMFS6 and i`m trying to figure it out how it could be enabled for “upgraded” datastores. Im more than sure for the VMFS datastore its just a parameter, which allows certain hosts to perform “clawrer” operation – am I correct?
Also could you elaborate a little bit about intra-VM sprace reclamation? In prior to 6.0 version there were a lot of requirements which had to be implemented in order to VM level reclamation work… nothing of those exists on Vspshere 6.5
IS it really that simple? Will it be enabled by default for VMs migrated from VMFS5 datastores or older infrastructure version (ie. <6.0)
I have a question about the Docker Volume Driver for vSphere. We are currently evaluating it but there is one thing I can’t get to work that feels like it should.
The lab setup is the following (no vcenter here):
-SharedDS1 <= Docker volume Driver Photon1 => Container
-SharedDS1 <= Docker volume Driver Photon2 => Container
-Photon1 and 2 in the same Swarm.
The basics of the setup work fine:
– I create a docker volume with vsphere driver on photon1 for example and it pops up in “docker volume ls” on photon2 <- OK
– I deploy a container on photon1 from a yml file to use the previously created vsphere volume <- OK
– I add a file to the container to test the persistance, kill the container, it comes back with the file still here <- OK
Now I drain stop Photon1:
– The container starts on Photon2 but the data is back to default, no file <- PROBLEM
Could you point me in the direction of what I might be missing?
If host 2 dies, both stripes die. That makes sense. However, copies of all data remains available collectively from host 1 and 3 (a, b, c, d). Is it possible vSAN would handle this in the future to abstract this further to keep data online and make increasing stripe width more flexible and resilient?
Also are you aware of any similar writing that covers stripe width and erasure coding? I imagine one could paint oneself into a corner if not careful (ex. 6-node cluster, erasure coding, FTT=2, SW=12), and unable to evacuate a host for prolonged maintenance. Or is the assumption that stripe width is performance-driven, and in that case mirroring would be used instead of erasure coding?
Hi Russell, sorry for the late response. I was on vacation.
I’m not sure what the ask is in part 1 of the question.
For part 2, there is indeed a consideration when it comes to using stripe width with erasure coding, just as there is for mirroring. However it is more to do with number of components and distribution of these components. This post shows some of the component numbers that you might end up with when each segment of a RAID-5/6 has a stripe width associated with it – http://cormachogan.com/2017/03/23/sizing-large-vmdks-vsan/ (note that the layout show here is due to size of vmdk and not stripe width, but the outcome is the same)
I stumbled upon your Photon post when searching for this peculiar issue I’m facing in my lab testing vSAN 6.6.1:
I was just setting up the vSAN from scratch (not Photon, just generic brand new vSAN cluster setup with the latest 6.5U1 ESXi host install with the latest vCenter 6.5U1 install on a separate host – all with the latest updates and patches as of today)
I ended up getting the same “vSAN cluster configuration consistency” warning that also listed the issue being each of the 6 hosts showing “invalid request (Correct version of vSAN Health installed?)”
I noticed the following behavior:
– initial setup of the cluster with vSAN without disks claimed had them in multicast mode without any network partition problem (tested multicast traffic was working with tcpdump-uw)
– upon claiming disks, it tried to go into unicast mode and then every single host became partitioned. Unicastagent list came up empty. Cluster complains of being partitioned.
– manually adding the unicastagent entries resolved the partition problem, and that’s how I ended up with the “vSAN cluster configuration consistency”
Wondering if you have better insight to this issue. Thanks!
Hi Guybrush – I have not seen this behaviour in my 6.6.1 setup. Hosts should automatically switch to unicast without needing you to manually adding unicast agent entries. I’d strongly recommend engaging technical support to see why you are getting this behaviour.
Hello Cormac, I´ve read this article https://cormachogan.com/2013/07/08/automating-the-iops-setting-in-the-round-robin-psp/ I’m not a storage expert… regarding the IOPS does it have to be 1000 or 1 at all? Can it be a value in the middle? If not, why? And if yes, what is criteria to adopt? If you could even point me to a document that explains the answer I’m looking for it’d be great! Many thanks
Best advice is to speak to the storage array vendor and ask for their recommendation. I don’t believe we (VMware) make any recommendation around this setting.
Well, I’m biased as you might imagine, but i still think you would get a lot out of it. The fundamentals have not changed much, but of course there are additions and enhancements to vSAN since the book was launched that won’t be captured.
If you take a look at figure 5 at page 10 you wrote that if I have 2 separate switches I can use IP hash load balancing technique to achieve faster connection and load balancing goals.
But do you mean that in the virtual switch I enable IP Hash and put in there all the 4 NIC? How can be handled this configuration if on the phisical switches there are 2 different etherchannel?
Which kind of results in term of NIC utilization should I expect?
I suspect this reference to IP Hash is outside the scope of etherchannel. If you had etherchannel, you could create LAGs which would do a better job of load balancing. But if you did not have etherchannel, and the NAS array presented datastores on multiple ports, you could try IP Hash and see if you could balance the load across multiple uplinks. Of course, this is very hit-or-miss, and you may not get any load-balancing even after setting up IP-Hash. BTW, there is an updated version of this paper available here – https://storagehub.vmware.com/#!/vsphere-storage/best-practices-for-running-vmware-vsphere-on-network-attached-storage
In the “What’s New course 6.5, they speak about VMFS6 and the shared resource pool locks. It is not really explained what it does at the background. Can you please give some explanation?
I’ve read your blogs about SanDisk’s FlashSoft solution and I was wondering if you might be able to help me out.. I’m trying to delete the FlashSoft 4.1 host components off of some ESXi hosts, but when I click on Uninstall from the FlashSoft interface on my ESXi Cluster’s configure -> FlashSoft tab, it just says “Remove the configured storage policy from all the VM templates, virtual machine(s) (including its virtual disks and VM Home) in this cluster to uninstall host component.” The problem is I have no Storage policies configured whatsoever for disks, VMs, hosts, etc. so there’s nothing (that I know of) to remove! Was hoping you’d be able to tell me what I’m missing in order to get the host components uninstalled from my cluster. Thanks!
I’m afraid not SAL. I think the FlashSoft team have a new startup called JetStream Software. You might be able to reach out to them via twitter or some other method, and see if they can offer some advice.
1. Considering the automatic 255GB Large Object Limit “chunking process” relative to a 900GB VMDK:
-Do “chunk/stripes” created count toward the Policy-based “Number of stripes per object” setting?
-If using Default Policy of 1, does that make the “effective” stripe width 4?
-If not, is it correct to call them “chunks” as opposed to “stripes”?
-Is the only difference between between the two that one can be placed on multiple drives while the other cannot?
-Does vSAN create the number of “chunk” stripes needed for any object larger than 255GB at the time of instantiation or as needed?
-Does vSAN always place “chunks” on different drives/DGs up to the point that the FTT policy cannot be met?
2. Considering “Number of disk stripes per object” policy setting for a 6.6+AF stretched cluster with 48 capacity drives:
-Is it correct to assume that when this setting is part of a policy that sets PFTT=0 and Locality, that replica and stripe placement is affected in kind, leaving 24 drives to work with in one site?
-Is there a formula do derive the maximum numbers of stripes we can use in one site and still meet SFTT=1/RAID-1? Or, what is the minimum number of drives to satisfy a 12 stripe width? (I re-read your article but I think I’m still missing something in the maths.)
1.1 No, chunks do not count towards the stripe width.
1.2 No, Stripe width is still effectively 1. The chunks can be thought of as concatenations, not stripes (if you are fmailiar with RAID terminology)
1.3 Yes. That is how I refer to them to differentiate them from stripes.
1.4 No. There is another difference. We do not stripe data across chunks like we do with a real stripe. We fill chunks before moving to next chunk.
1.5. No. You can observe this when you create your 900GB VMDK. If it has not space reservation, it will be deployed as a single component, and new components are added as it grows. Objects with a stripe width as instantiated as multiple components immediately.
1.6 No. Chunks can be placed anywhere so they could all end up on the same disk.
2.1 No. Currently a RAID-6 with PFTT=0 can be deployed across both sites, unless you specify site affinity in the policy. If you set site affinity, then what you describe is true.
2.2 Well, it will be 24 physical drives for the mirrored stripe, but I think the witness requirement is dependent on the number of hosts, disk groups and disks. I don’t know how the algorithm works at this scale, but it might be that it can get away with extra votes on some components and not need a witness, as described in this case – https://cormachogan.com/2015/03/13/vsan-6-0-part-1-new-quorum-mechanism/
Cormac, thanks for the wonderful materials on VSAN you’ve made available to us, it has been a great help to me in my career.
I’ve been trying to put together a VM Build QA report using vROM 6.4 (no, my company is still not automating VM builds, despite my best presentations showing them the benefits…) due to a number of mistakes made by admins building VMs. One thing in this report that I need to confirm is the VSAN Storage Policy. This is the last bit I need and I simply have not been able to find where this piece of information is stashed. Would you happen to know if, in a list view it would be possible to show what VSAN storage policy a VM is in? I’d prefer to do this in vROM just because most people on this team are comfortable with it, and it can be easily made available to others if required.
Hello Cormac, I wondered if you are aware of any issues with vmfs6 and Windows 2012R2? We have been experiencing issues with particular volumes within Windows, for i.e. a D: drive, whereby something (have no idea) is modifying the windows volume partition array table. When whatever touches the volume, what happens is that the data within the drive becomes unreadable. We have to contact Microsoft who then uses a tool dskprobe to modify the beginning and end sector number to then make the data accessible. This issue seems to be isolated to servers that have sql installed, vmfs6, and Windows 2012R2. We have uninstalled McAfee, created new vmdk’s and migrated the data from this old vmdk’s to a new. We are stumped at this point. Being close to vmfs, wondered if you had heard of anything somewhat related to this. We are on esxi and vcenter 6.5 U1.
what is the intent with VMFS6 and vsan? Since you cannot do an inplace upgrade from 5 to 6, and most of us who use vsan don’t have another storage location to put our prod vms while the delete and recreate datastore takes place, what is the team suggesting?
Hi Glenn – vSAN does not use VMFS. It has its own on-disk format. So the changes from VMFS-5 to VMFS-6 is not a concern for vSAN users.
The reason for the inability to do an in-place upgrade of VMFS-5 to VMFS-6 because of the major format changes that were made. The recommendation now is to evacuate at least one disk by Storage vMotion’ing your VMs to another datastore, reformat the evacuated datastore from 5 to 6, then do a rinse and repeat of this operation across your datastores.
I have an iSCSI storage and need do find out which paths to a device/LUN/datastore are “active (I/O)” and which only “active”. If it would be Fibre Cannel (or, I think, the same for Hardware iSCSI initiators too) it is not a problem – the source of the path show the hba (number, WWN, etc.). However, if we use the software iSCSI initiator, the source of each path is always the same – the iqn of this iSCSi software initiator. How can I find out +which path is using which vmk (number, IP address of the kernel adapter). Thank you. Best regards, Andreas Wedel
I think this is available in the UI (correct?) but I am guessing you are looking to figure it out from the CLI. I’m not sure if it can be done, but take a look at vmkiscsi-tool. This may be able to give you what you need (not sure). Also be aware that the tool is deprecated, but is still available for compatibility reasons. But there is no guarantee that it will be removed in a later version of vSphere. Hope this helps.
thank you for your fast answer. Unfortunately, I have not found a solution for my problem: nor with esxcli iscsi commands, nor vmkiscsi-tool.
The problem is as follow: a customer has a configuration with two iscsi kernel interfaces; each of them is bounded to one vmnic. One of this vmnic is 10Gbit, the other – 1Gbit. I know it is not a happy configuration, but it is as it is. So he has four paths to an LUN on active/active storage iSCSI (ALUA SATP) and is using the FIXED PSP (but the same problem would be with MRU PSP and it is not recommender to use the ROUND-ROBIN PSP). The challenge is to set the right (10 Gbit) path (vmk/vmnic) as preferred, so it always would be used if it is up, and the 1Gbit path would be used only if 10 Gbit is down. It is possible only with FIXED PSP, because here we have a failback, but not with the MRU PSP. The question is which one from the paths goes over 10 Gbit and whitch over 1 Gbit, to set the first one as preferred and make him to the path with activ (I/O) status. All four paths have the same iqn as initiator so it is not possible to know it using GUI. But it seems to be the same problem at the CLI. The way to look at esxtop to watch the vmk (and switch the preferred path if not the 10Gbit has the most traffic) isn’t the finest. Sorry about my English. Thank you. Best regards, Andreas Wedel.
In vSAN it looks like namespace objects don’t automatically expand when they hit the 255GB component size limit. Is there a way to manually grow the namespace object we are trying to transfer a few hundred GB of ISOs and Templates from legacy storage to vSAN?
Is there a reason why you are not using a Content Library? This could sit on top of your vSAN datastore and can be used as your ISO repo. This is what I do in my own lab.
Hello Cormac,
Quick question on the mixing different vendor SSD in an all-flash cluster. I need to add capacity to a two-node all-flash VSAN cluster that’s been running for the past two years. The original 960GB SSDs are EOL, so I will need to source alternatives. Assuming that the ‘new’ 960GB SSD are on the VSAN HCL, is the configuration supported if I have a mix of SSDs from different vendors in the hosts? Is there anything I need to look out for?
Regards,
Anthony
To be best of my knowledge Anthony, support should not be an issue. This is a common scenario, so we have to support disk group with a mix and match of devices.
However you will need to ensure that the new devices are the same class of device as your existing devices. Other than that, its always good to check in with GSS in case they have any advice to offer you.
Hi Cormac,
from some tests it would seem that 6.7 does a 4 component setup for objects with a PFTT=0 & SFTT=1 (stretched) with one data and one witness at each site (and one of the data components with 2 votes for a total of 5).
Would you please tell me why is that setup used instead of the usual 2 data + witness (i.e. only one witness) ?
Hi Cormac,
Love your blog and writings, learning a lot!
regard this article: https://cormachogan.com/2015/02/04/vsphere-6-0-storage-features-part-1-nfs-v4-1/#comments
I’m trying to understand if the risk exist also if you have two clusters:
cluster01: all its hosts mount export_for_vmware with NFSv3
cluster02: all its hosts mount export_for_vmware with NFSv41
Both clusters are Running different VM’s
Bottom line: Is the risk is per shared data or all the datastore?
Thanks!
So I spoke to engineering on this, and the risk is per shared data. If there is no shared data across the cluster, then technically, you should be ok.
Now, be aware that even if you strive to deploy VMs and avoid sharing data, vSphere may still do some thing which break this. For example, if this datastore is picked for HA heartbeats, or distributed switches write their metadata here, or SIOC (if/when it gets supported on NFS v4.1), then you could run into problems.
Is there a use case for this configuration? Is there a reason why you would not go ahead and just use two distinct datastores? It would seem to be the less riskier approach.
I’m interested in your blog and your writings, they brought alot stuff about kubernetes on vsphere!
One question I’ve struggling aroung with is the recommended / optimal size of a datastore.
Which initial size of datastore would you recommend and specify it in the storage class?
What, if a datastore is full. Would you provision a second datastore and create a new storage class with the new datastore specified? This approach would result in more administrative overhead.
I don’t have enough exposure to the operational side of K8s running on top of vSphere yet Lukas, so very difficult for me to say what is the best approach to take at this time. I’ll keep your question in mind as I meet more customers who are implementing K8s on vSphere.
In my environment VIC was created with proxy so when I am trying to push images from docker to harbor I am getting error saying received unexpected http status: 503 service unavailable and when I try to login to harbor It doesn’t throw authentication error instead it gives error :error response from deamon: login attemt to http://harbor ip failed with status 404 not found.
I am successfully able to push images from docker client to vIc harbor in proxy less envirionment, I am not sure what is being blocked
Hi Cormac, first of all thanks for all your posts. It seems from VMware support that there is a big problem in vmfs6 which causes random high stun times during snapshot creations and snapshot removals, this kind of problem is present even in vsphere 6.7 latest patch, at the moment the only workaround is to go back to vmfs5; this scenario applies to most storage vendors but some are mostly affected, we are using Dell Compellent storages and here the problem is evident. Are you aware of some troubles related to vmfs6 and stun ?
I’m not aware of any issue Alex, but I am not working deeply on VMFS so much these days. I would strongly recommend opening a Service Request with GSS (our support organization) to work on a solution.
Hi Cormac, this information is directly coming from VMware support, because we opened a ticket 3 months ago. Dell support is also involved, at the moment they informed us that Dell Engineering is trying to contat VMware Engineering to find a shared resolution.
I read your article on Stripe Width (SW), which I understood; however my question is –
In a stable (Hybrid, FTT=1, FTM=RAID-1) environment, all writes and majority of the reads would be catered by the single disk in the Cache tier then how a SW of anything greater than 2 (first write going to DG-1 and mirrored write going to DG-2) can help?
Wouldn’t an increased SW ONLY come into picture during the destaging process? And will it really give me performance gains?
Question-2
Will the cache tier of both (Hybrid, FTT=1, FTM=RAID-1) the DGs holding the hot reads will be holding the same hot data?
I ask this because reads are catered by both Active and Passive replica, this would only make sense when cache tiers of 2 DGs are somewhat identical in terms of data they hold.
Does vSAN accumulate hot data from both active and passive replica in the cache tier?
Hi,
Right now i am doing a lab on vSphere with Tanzu. When i try to prepare my cluster for workload domain, a error occur “Cluster domain-c38 is unhealthy: the server has asked for the client to provide credentials”. Also error occur in log file “Error occurred sending Principal Identity request to NSX: principal identity already created”. When i restart WCP service on vcenter, the NCP Plugin goes down in NSX-T Manager and this error “Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried” shows. BTW i am not doing my lab manually without VMware cloud builder.
Hi,
Im trying to run sonobuoy on my K8s cluster.
The command – sonobuoy –kubeconfig /root/.kube/config images pull
Im getting the follwoings:
INFO[0000] Pulling image: sonobuoy/sonobuoy:v0.19.0 …
ERRO[0002] failed with following error after 1 retries:
ERRO[0002] Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Hi Cormac,
Interested in Tanzu DR and what the best practices might be for this? I am researching DR to AWS via SRM or DRaaS / Zerto etc. Any thoughts or publications on this ?
Hi Cormac,
Looking to know is there any blueprint for Tanzu DR ?
Looking into Cloud based options currently for on premises workloads and curious what is the strategy for Tanzu ?
Hi Thomas – I have only ever done this during the Workload Management deployment. You add additional networks when in the process of creating workload networks. I’m not aware of a way to go back and create additional networks after you have completed the workload management deployment. It sounds like a very reasonable feature to include however, so I will bring this to the attention of the product team.
I want to ask you something, is there any way to create a pv using kubernetes yaml file with the multi-writer flag of vmware vmdk file? I want the vmware vmdk file still be able to attach to another node in case of running node crashes. thank you alot
Hi Nam,
There is no support for multi-writer mode / read-write-many for block volumes with the vSphere CSI driver. It is only available with file volumes e.g. vSAN File Service shares.
However, you should not need it for such a scenario. The PV should get removed from the failing node, and re-attached to the node where the Pod is scheduled / restarted.
In my failover test I shutdown vm suddenly and I see that dead vm is still holding the disk and pod is in terminating state forever so that disk can’t be attached to another node and new pod is in pending state, what should I do?
Hi Cormac, we have a 8 Node vSAN-Cluster with vSAN-File-Service enabled. Now we want to reduce to 6 Hosts => is there any documentation how to remove ESXi-Host from a Cluster with vSAN-File-Service enabled? Because if I do the precheck for maintenance-mode with full data migration i got an error for the vSAN-File-Service Agent running on this host will become inaccessible. Regards and thanks a lot for an answer!
Hi Thomas – I don’t think you are able to do anything with the vSAN File Services nodes. These VMs are just a way to run the Protocol Stack containers, and to the best of my knowledge, these cannot be moved or migrated from the host that are deployed onto. The Protocol Stack container that runs inside of the vSAN File Services node VM will move to another vSAN File Services node VM on another ESXi host in the cluster during maintenance mode, but you cannot do anything with the actual VM itself. So what you are seeing is expected in my opinion.
I’m not sure if there is a procedure to reduce the number of nodes in a vSAN cluster when running vSAN File Services. I’m personally not aware of any way to do it, but it might be worthwhile getting in touch with support in case there is a procedure that I am not aware of. I’m afraid I haven’t been keeping up with vSAN updates too much these days.
Hello Cormac, I have a quick question. When setting up vSAN 7 with a witness appliance, must I set up the vmk1 for the vSAN traffic before creating a distributed switch for the 2-NODE vSAN cluster? So the correct order would be
1. Create 3 x ESXi hosts
2. Install vCenter on the 3rd host which will be the witness
3. install the witness appliance on that host
4. Create a vmk1 for vSAN in addition to the premade vmk0 for management traffic
5. Create the VSAN cluster and add the two vSAN hosts
6. Create a distributed switch will be applied to only the 2 vSAN hosts
7. Enable vSAN on that cluster with HA and DRS
8. Finish config and make sure I can SSH to hosts and vmkping all IPs from all vmks?
Any help welcome as I keep getting done by gotchas each time and get only a bit further
Yes – I think that covers it. The only consideration is as follows. In my 2-node setup, I use VLAN 51 for management and VLAN 500 for vSAN. Thus the witness appliance, which is a VM, has 2 VM networks, one to VLAN 51 and the other to VLAN 500. So that ESXi host where the witness appliance is deployed used VM portgroups. For my physical 2 node cluster, they have 2 x VMKernel networks, again one for mgmt on VLAN 51 and the other for vSAN on VLAN 500.
Thanks for that Cormac. I have another quick question. I previously had these 3 servers using HP Storevirual vSAN. It was set up by an IT company. Now though, Storevirtual is end of life and I want to use them for VMware vSAN. The previous setup was that all three servers have 2 x 2-port fiber cards and they were all linked for vSAN traffic via 2 x Aruba fiber switches. The Ethernets were used for Management, VM Network and vMotion traffic. I have read recently that the servers can be linked using Direct Connect. Would that be possible via the fibers? So, each server connected to each other via the fibers directly and then those ports are configured for vSAN traffic etc? or would i still need to use the Aruba Switch to connect the fiber cards? Not sure if the Direct Connect only works with 10Gb Ethernet ports? If I don’t have to use the physical switch but can still use the 10Gb fiber that would be better and have less points of failure. As always any info welcome. Thanks Cormac
Direct Connect can certainly be used for 2-node deployments Steve. If you’re already doing vSphere networking through the fibers for management and vMotion, I don’t see why they won’t work with vSAN traffic for direct-connect. And yeah – 10Gb seems to be required, so you should be ok there.
Hi Cormac, thanks again. The management and vMotion and VM Networks were going through the 4 Ethernets on the servers. Just vSAN was going through the fibers via the Arubas. I did see that you need a cross-over maybe if you do Ethernet, with fiber though there aren’t any cross-overs are there, they all have receive and transmit fibers?
It might be possible through editing some of the CSI configuration files, but I don’t know if it is supported. I would recommend opening a case with the GS organisation and asking their advice before changing anything.
Hi Cormac
my name is Weithenn (vExpert 2012 ~ 2014). Very pleased to be able to translate your writings with Duncan. Now, Traditional Chinese version already on the market in Taiwan.
Available via Tenlong Boot Store (http://www.tenlong.com.tw/items/9863473413?item_id=999390)
Available via GOTOP publishing house (http://books.gotop.com.tw/v_ACA020200)
Wonderful news – thanks for your efforts Weithenn
Is this article still relevant Cormac?
http://cormachogan.com/2014/11/05/vsan-and-oem-esxi-iso-images/
Great blog btw!!
Yes – we don’t have any control over the drivers that OEMs pace on the images, so we don’t know if they have been certified for VSAN or not. Use the health check plugin to verify that everything is good from a HCL perspective after VSAN has been deployed with the OEM images. However, you may need to install different drivers if there is a mismatch.
RAID controller testing for VSAN HCL
https://communities.vmware.com/thread/522127
I’ve also opened an SR but haven’t had a lot of success in determining who is responsible for this process. Do you have any suggestions?
Hi Cormac,
I wanted to reply for this article of yours: http://cormachogan.com/2015/05/07/vsphere-6-0-storage-features-part-8-vaai-unmap-changes/
but i believe it is too late now to post there, so posting here.
You have mentioned that UNMAP is supported from windows 2008 and above. But from the windows blog, https://msdn.microsoft.com/en-us/library/windows/hardware/dn265487%28v=vs.85%29.aspx, and from our testing in lab, we see that windows supports understands thin provisioning only from windows 2012. Please correct me if i am wrong.
I understood that this functionality was also in Windows 2008, but it appears I was mistaken. Thanks for highlighting. I’ll adjust the post.
Hi Cormac,
In your article “VSAN considerations when booting from USB/SD”, you mentioned that VSAN traces need at least 500 MB of space. However, official VMware document (https://www.vmware.com/files/pdf/products/vsan/VMware-TMD-Virtual-SAN-Hardware-Guidance.pdf) explicitly note that we are not supposed to increased the default size of 300 MB.
The reason I ask this is because I constantly have hosts that giving out “ramdisk ‘vsantraces’ is full” error. I’m thinking to increased its size, but got hesitated.
Thank you very much for your response.
Is this booting from SD or USB Aditya?
I’m not sure there is a way of modifying the layout on these devices, so please check with GSS to be sure. The issue is that these devices do not get a persistent scratch area.
When booting from a SATADOM, these “appear” like disk devices so have the 4GB scratch partition.
We are looking at improvements in this area. Let me see if I can find more info.
Cormac, apologies for jumping on this thread… but could you raise the question internally, as to how after I follow the bootstrapping VSAN VMware document, I can then migrate the standard switches created during this process to distributed switches when the vcsa / psc are sat on the vsan datastore that the bootstrapping process creates?. The document says it can be done.. we are now stuck in the middle of an NHS project to get vsan / horizon in place (using vDS) however after following the document we are stuck with std switches..
Trying to find someone who can validate it Paul. Unfortunately the author of the paper is no longer with VMware.
Hi Cormac, many thanks for taking the time to check. I understand you must be busy and this isn’t your document. So your help is much appreciated.
I’m just toying with the idea of using 1 of the 4 hosts to create a local VMFS to deploy psca and vcsa to. Then use this vcsa to deploy a vsan using the remaining 3 nodes and then attempt to storage vmotion the psc and vcsa to the 3 host vsan datastore I created.. then once that is done, blow away the disk config on the host I used to create the temp VMFS and ultimately put this host in to the 3 host VSAN cluster to give it the 4th host..
Have you storage vmotioned a vcsa over to a VSAN in this way in your lab/test?.
Thanks again
Paul.
I haven’t done that Paul, but if all the networking is in place, it should theoretically work. Sorry I can’t be of more help. I’m on the road so I can’t try this out for you either.
Hey Cormac, the svmotion migration of the VCSA & PSC to the VSAN datasore all worked fine as all networking was in place. I’d be interested to see if a solution to the migration to distributed switches from standard can be achieved after the bootstrapping of VSAN has taken place.. because logically that would be the next step for folks like us who are in a greenfield environment.. staying at standard switch config would have made the NIOC impossible so it was quite a show stopper.
Anyhow the migration to the VSAN datastore all worked out well, thanks for your communications Cormac,
The hosts are booting from USB drive.
I found another article on the web referring to your article that explain how to increase this ‘vsantraces’ partition.
But I’ll open a ticket to GSS, just to be sure.
Thank you very much Cormac.
Hi,
are you also into VSAN managed by the ruby console with newly 6.2 features? Found: vsan.sizing. on the rvc but no documentation on that anywhere…
Let me look …
This vsan.sizing. is “future-proofing”. This RVC extension has not yet been implemented, but we “may” implement something around it going forward.
Hi Cormac,
Do you have any insight on why my ESXi hosts (2 out of 3-node cluster) would be running the vsanTraceReader process at 100%?
Result of ‘zcat vsantraces–2016-05-03T03h27m38s818.gz | /usr/lib/vmware/vsan/bin/vsanTraceReader.py > vsantracerlog.txt’ is here.
https://drive.google.com/open?id=0BwUaxgRk7cH_U1Vha1ZYd0xuTk0
Afraid not – please speak to support Steve.
Hello, i love your blog and try to understand the write process of vsan. But after reading a lot of things i still have some issue about the write path. I understand that vsan is all about object and than an object have a maximal size of 250Gb. So my question is that an object is tight to a specific disk group? Or an object will be span to multiple disk group ?
Same i understand that the write will be aggregate in the ssd (write cache) of a disk group, so the question is the 1mb chunck will be destage on all disk or only on the disk of the disk group?
I try to understand at the end if my VM will be write to 1 disk group and replicate to another one, or if at the end the chunck will be put everywhere on the cluster.
Regards
A1. No, an object is not tied to a specific disk group. It can be made up of multiple components, which can span disk groups, and the components themselves can span different disk/disk groups depending on size.
A2. 1MB chunks are per disk device.
A3. VSAN may decide to split up a component across multiple disks and disk groups, depending on size and available space. However the destage process is local to a disk group, as the cache device only caches blocks for its own disk group.
Hi cormac,
i’ve had a problem using Vvol in 3par 7200c storage, let me explain: I use vmware vcenter solution to create virtual machines in a HP blade system. Suddenly one of the virtual machines lost 2 of which had 4 HD’s. Vcenter only shows that there are 2 .vmdk, but when I access 3par CLI and use the comand showvvolvm I can see that actually there are 4 .vmdk. I want to recover these 2 lost .vmdk, can you help?
I’d recommend opening an SR with both VMware GSS and HP support to recover from this situation John. Let me know how it goes.
Hi Cormac,
Let me first tell you how awesome your blog is. A lot of great posts.
I would like to know if there is a way (even not supported) to do RAID 1 with two disks groups on a single node vSAN ?
It would allow me to lose a disk group and I would stiil be able to run my VM in the second disk group.
It’s only for lab and testing purpose.
Thanks a lot for your answer.
Best Regards,
Jonathan.
Hi Cormac,
I’ve been reading on your post regarding VSAN 6.2 Part 10 – Problematic Disk Handling.. As per your post, “If VSAN detects excessive write IO latency to a capacity tier disk, the disk will be unmounted. By default, VSAN 6.2 will no longer unmount any disk/diskgroup due to excessive read IO latency, nor will it unmount a caching tier SSD due to excessive write IO latency.”
However this seems to be not the case for us last week. our DB VM on VSAN 6.2 had a production downtime due to VSAN unmounting a disk (VSAN marked the disk as absent), Sad to say that VMware GSS was unable to assist us properly nor provide an RCA. They said that it was a case of very high I/O latency, which we already knew.
Anyway, I did search around and found your post, which made sense on what happened. in addition, it seems to also be a behavior of Linux systems (marking the LUN as read only). Although based on what you said, VSAN by default shouldn’t have unmounted the disk. What would be your view on this?
Regards,
Albert
Hi Cormac!
I am just trying to find the answer for one of the issue, which I faced recently.
We initially built a vSAN cluster with 5 hosts and 5 capacity disks for each disk group.Each host had 1 disk group. and we enabled compression and deduplication in the vSAN cluster. Later we decided to add 2 more hosts to the vSAN cluster and also add 2 more capacity disks to each disk group. I was able to add 2 new disk groups to the vSAN cluster and when I tried to add 2 more capacity disks to each disk group, I got an configuration error stating that compression and deduplication should be disabled before adding disks to each disk group. Once I disabled compression and deduplication in the cluster, which took almost 30 hours, then I was able to add capacity disks to each disk group. after adding the capacity disks, I re-enabled compression and deduplication and that took another 30 hours. I am wondering, if that restriction will be changed in vSAN 6.5?
No change in 6.5. The reason for this behaviour is outlined here: http://cormachogan.com/2016/02/12/vsan-6-2-part-1-deduplication-and-compression/ in the “Other considerations” section.
Hello Cormac!
I have question regarding VMFS 5 and 6 in Vsphere 6.5
It concerns me as a lot of new functionalities (oh you mighty UNMAP :)) are introduced within VMFS6 and i`m trying to figure it out how it could be enabled for “upgraded” datastores. Im more than sure for the VMFS datastore its just a parameter, which allows certain hosts to perform “clawrer” operation – am I correct?
Also could you elaborate a little bit about intra-VM sprace reclamation? In prior to 6.0 version there were a lot of requirements which had to be implemented in order to VM level reclamation work… nothing of those exists on Vspshere 6.5
IS it really that simple? Will it be enabled by default for VMs migrated from VMFS5 datastores or older infrastructure version (ie. <6.0)
We will have a 6.5 Core Storage WP out soon. It should answer many of your questions. I’ll announce it here as soon as it is available.
Hi Cormac,
I have a question about the Docker Volume Driver for vSphere. We are currently evaluating it but there is one thing I can’t get to work that feels like it should.
The lab setup is the following (no vcenter here):
-SharedDS1 <= Docker volume Driver Photon1 => Container
-SharedDS1 <= Docker volume Driver Photon2 => Container
-Photon1 and 2 in the same Swarm.
The basics of the setup work fine:
– I create a docker volume with vsphere driver on photon1 for example and it pops up in “docker volume ls” on photon2 <- OK
– I deploy a container on photon1 from a yml file to use the previously created vsphere volume <- OK
– I add a file to the container to test the persistance, kill the container, it comes back with the file still here <- OK
Now I drain stop Photon1:
– The container starts on Photon2 but the data is back to default, no file <- PROBLEM
Could you point me in the direction of what I might be missing?
Thanks!
Please let me know if you think this matches your issue – http://vmware.github.io/docker-volume-vsphere/documentation/known-issues.html
Otherwise, I strongly recommend filing an issue directly on github for vDVS
Thanks for the quick reply.
Turns out the symlink to the db file stored on a shared datastore doesn’t persist after a reboot of ESXi.
I posted a reply in github : https://github.com/vmware/docker-volume-vsphere/issues/1032
Cheers
Hi Cormac,
Regarding example #4 in your article: http://cormachogan.com/2015/01/15/vsan-part-34-how-many-disks-are-needed-for-stripe-width/
If host 2 dies, both stripes die. That makes sense. However, copies of all data remains available collectively from host 1 and 3 (a, b, c, d). Is it possible vSAN would handle this in the future to abstract this further to keep data online and make increasing stripe width more flexible and resilient?
Also are you aware of any similar writing that covers stripe width and erasure coding? I imagine one could paint oneself into a corner if not careful (ex. 6-node cluster, erasure coding, FTT=2, SW=12), and unable to evacuate a host for prolonged maintenance. Or is the assumption that stripe width is performance-driven, and in that case mirroring would be used instead of erasure coding?
Hi Russell, sorry for the late response. I was on vacation.
I’m not sure what the ask is in part 1 of the question.
For part 2, there is indeed a consideration when it comes to using stripe width with erasure coding, just as there is for mirroring. However it is more to do with number of components and distribution of these components. This post shows some of the component numbers that you might end up with when each segment of a RAID-5/6 has a stripe width associated with it – http://cormachogan.com/2017/03/23/sizing-large-vmdks-vsan/ (note that the layout show here is due to size of vmdk and not stripe width, but the outcome is the same)
Hi Cormac,
I stumbled upon your Photon post when searching for this peculiar issue I’m facing in my lab testing vSAN 6.6.1:
I was just setting up the vSAN from scratch (not Photon, just generic brand new vSAN cluster setup with the latest 6.5U1 ESXi host install with the latest vCenter 6.5U1 install on a separate host – all with the latest updates and patches as of today)
I ended up getting the same “vSAN cluster configuration consistency” warning that also listed the issue being each of the 6 hosts showing “invalid request (Correct version of vSAN Health installed?)”
I noticed the following behavior:
– initial setup of the cluster with vSAN without disks claimed had them in multicast mode without any network partition problem (tested multicast traffic was working with tcpdump-uw)
– upon claiming disks, it tried to go into unicast mode and then every single host became partitioned. Unicastagent list came up empty. Cluster complains of being partitioned.
– manually adding the unicastagent entries resolved the partition problem, and that’s how I ended up with the “vSAN cluster configuration consistency”
Wondering if you have better insight to this issue. Thanks!
Hi Guybrush – I have not seen this behaviour in my 6.6.1 setup. Hosts should automatically switch to unicast without needing you to manually adding unicast agent entries. I’d strongly recommend engaging technical support to see why you are getting this behaviour.
Hello Cormac, I´ve read this article https://cormachogan.com/2013/07/08/automating-the-iops-setting-in-the-round-robin-psp/ I’m not a storage expert… regarding the IOPS does it have to be 1000 or 1 at all? Can it be a value in the middle? If not, why? And if yes, what is criteria to adopt? If you could even point me to a document that explains the answer I’m looking for it’d be great! Many thanks
Best advice is to speak to the storage array vendor and ask for their recommendation. I don’t believe we (VMware) make any recommendation around this setting.
ok thanks, I have no need however, I was just trying to know more around this configured value
I am interested in purchasing the book Essential Virtual SAN (2nd edition). Considering it was written over a year ago, is it still worth purchasing?
Well, I’m biased as you might imagine, but i still think you would get a lot out of it. The fundamentals have not changed much, but of course there are additions and enhancements to vSAN since the book was launched that won’t be captured.
Hello Mr. Cormac, I’m not sure if I could ask here, but I need a clarification about an old White paper you posted on vmware site.
https://www.vmware.com/techpapers/2010/best-practices-for-running-vsphere-on-nfs-storage-10096.html
If you take a look at figure 5 at page 10 you wrote that if I have 2 separate switches I can use IP hash load balancing technique to achieve faster connection and load balancing goals.
But do you mean that in the virtual switch I enable IP Hash and put in there all the 4 NIC? How can be handled this configuration if on the phisical switches there are 2 different etherchannel?
Which kind of results in term of NIC utilization should I expect?
Thank you in advance.
Hello Marco,
I suspect this reference to IP Hash is outside the scope of etherchannel. If you had etherchannel, you could create LAGs which would do a better job of load balancing. But if you did not have etherchannel, and the NAS array presented datastores on multiple ports, you could try IP Hash and see if you could balance the load across multiple uplinks. Of course, this is very hit-or-miss, and you may not get any load-balancing even after setting up IP-Hash. BTW, there is an updated version of this paper available here – https://storagehub.vmware.com/#!/vsphere-storage/best-practices-for-running-vmware-vsphere-on-network-attached-storage
In the “What’s New course 6.5, they speak about VMFS6 and the shared resource pool locks. It is not really explained what it does at the background. Can you please give some explanation?
Hi there – I talked about enhancements to VMFS6 in my VMworld session at VMworld 2017 – you can watch it here: https://cormachogan.com/2017/08/30/vmworld-2017-session-vsphere-6-5-core-storage-now-youtube/
Hi Cormac,
I’ve read your blogs about SanDisk’s FlashSoft solution and I was wondering if you might be able to help me out.. I’m trying to delete the FlashSoft 4.1 host components off of some ESXi hosts, but when I click on Uninstall from the FlashSoft interface on my ESXi Cluster’s configure -> FlashSoft tab, it just says “Remove the configured storage policy from all the VM templates, virtual machine(s) (including its virtual disks and VM Home) in this cluster to uninstall host component.” The problem is I have no Storage policies configured whatsoever for disks, VMs, hosts, etc. so there’s nothing (that I know of) to remove! Was hoping you’d be able to tell me what I’m missing in order to get the host components uninstalled from my cluster. Thanks!
I’m afraid not SAL. I think the FlashSoft team have a new startup called JetStream Software. You might be able to reach out to them via twitter or some other method, and see if they can offer some advice.
Hi Cormac, hoping you can help me understand a couple concepts regarding component placement. I studied this link and MANY others: https://cormachogan.com/2015/01/15/vsan-part-34-how-many-disks-are-needed-for-stripe-width/
1. Considering the automatic 255GB Large Object Limit “chunking process” relative to a 900GB VMDK:
-Do “chunk/stripes” created count toward the Policy-based “Number of stripes per object” setting?
-If using Default Policy of 1, does that make the “effective” stripe width 4?
-If not, is it correct to call them “chunks” as opposed to “stripes”?
-Is the only difference between between the two that one can be placed on multiple drives while the other cannot?
-Does vSAN create the number of “chunk” stripes needed for any object larger than 255GB at the time of instantiation or as needed?
-Does vSAN always place “chunks” on different drives/DGs up to the point that the FTT policy cannot be met?
2. Considering “Number of disk stripes per object” policy setting for a 6.6+AF stretched cluster with 48 capacity drives:
-Is it correct to assume that when this setting is part of a policy that sets PFTT=0 and Locality, that replica and stripe placement is affected in kind, leaving 24 drives to work with in one site?
-Is there a formula do derive the maximum numbers of stripes we can use in one site and still meet SFTT=1/RAID-1? Or, what is the minimum number of drives to satisfy a 12 stripe width? (I re-read your article but I think I’m still missing something in the maths.)
Mike, answers below.
1.1 No, chunks do not count towards the stripe width.
1.2 No, Stripe width is still effectively 1. The chunks can be thought of as concatenations, not stripes (if you are fmailiar with RAID terminology)
1.3 Yes. That is how I refer to them to differentiate them from stripes.
1.4 No. There is another difference. We do not stripe data across chunks like we do with a real stripe. We fill chunks before moving to next chunk.
1.5. No. You can observe this when you create your 900GB VMDK. If it has not space reservation, it will be deployed as a single component, and new components are added as it grows. Objects with a stripe width as instantiated as multiple components immediately.
1.6 No. Chunks can be placed anywhere so they could all end up on the same disk.
2.1 No. Currently a RAID-6 with PFTT=0 can be deployed across both sites, unless you specify site affinity in the policy. If you set site affinity, then what you describe is true.
2.2 Well, it will be 24 physical drives for the mirrored stripe, but I think the witness requirement is dependent on the number of hosts, disk groups and disks. I don’t know how the algorithm works at this scale, but it might be that it can get away with extra votes on some components and not need a witness, as described in this case – https://cormachogan.com/2015/03/13/vsan-6-0-part-1-new-quorum-mechanism/
Since upgrading our vSAN from 6.5 to 6.7, we have had high latency issues and loss of packets. Has anyone else had issues like this?
I’m not aware of any such issue Tony – I would strongly advise speaking with GSS to root cause this.
Cormac, thanks for the wonderful materials on VSAN you’ve made available to us, it has been a great help to me in my career.
I’ve been trying to put together a VM Build QA report using vROM 6.4 (no, my company is still not automating VM builds, despite my best presentations showing them the benefits…) due to a number of mistakes made by admins building VMs. One thing in this report that I need to confirm is the VSAN Storage Policy. This is the last bit I need and I simply have not been able to find where this piece of information is stashed. Would you happen to know if, in a list view it would be possible to show what VSAN storage policy a VM is in? I’d prefer to do this in vROM just because most people on this team are comfortable with it, and it can be easily made available to others if required.
Thanks again for the great work here!
It is visible on the summary page of each VM on the vSphere client Jon. Not sure how you get it from vROM though.
Hello Cormac, I wondered if you are aware of any issues with vmfs6 and Windows 2012R2? We have been experiencing issues with particular volumes within Windows, for i.e. a D: drive, whereby something (have no idea) is modifying the windows volume partition array table. When whatever touches the volume, what happens is that the data within the drive becomes unreadable. We have to contact Microsoft who then uses a tool dskprobe to modify the beginning and end sector number to then make the data accessible. This issue seems to be isolated to servers that have sql installed, vmfs6, and Windows 2012R2. We have uninstalled McAfee, created new vmdk’s and migrated the data from this old vmdk’s to a new. We are stumped at this point. Being close to vmfs, wondered if you had heard of anything somewhat related to this. We are on esxi and vcenter 6.5 U1.
what is the intent with VMFS6 and vsan? Since you cannot do an inplace upgrade from 5 to 6, and most of us who use vsan don’t have another storage location to put our prod vms while the delete and recreate datastore takes place, what is the team suggesting?
Hi Glenn – vSAN does not use VMFS. It has its own on-disk format. So the changes from VMFS-5 to VMFS-6 is not a concern for vSAN users.
The reason for the inability to do an in-place upgrade of VMFS-5 to VMFS-6 because of the major format changes that were made. The recommendation now is to evacuate at least one disk by Storage vMotion’ing your VMs to another datastore, reformat the evacuated datastore from 5 to 6, then do a rinse and repeat of this operation across your datastores.
Hello,
I have an iSCSI storage and need do find out which paths to a device/LUN/datastore are “active (I/O)” and which only “active”. If it would be Fibre Cannel (or, I think, the same for Hardware iSCSI initiators too) it is not a problem – the source of the path show the hba (number, WWN, etc.). However, if we use the software iSCSI initiator, the source of each path is always the same – the iqn of this iSCSi software initiator. How can I find out +which path is using which vmk (number, IP address of the kernel adapter). Thank you. Best regards, Andreas Wedel
I think this is available in the UI (correct?) but I am guessing you are looking to figure it out from the CLI. I’m not sure if it can be done, but take a look at vmkiscsi-tool. This may be able to give you what you need (not sure). Also be aware that the tool is deprecated, but is still available for compatibility reasons. But there is no guarantee that it will be removed in a later version of vSphere. Hope this helps.
Hello,
thank you for your fast answer. Unfortunately, I have not found a solution for my problem: nor with esxcli iscsi commands, nor vmkiscsi-tool.
The problem is as follow: a customer has a configuration with two iscsi kernel interfaces; each of them is bounded to one vmnic. One of this vmnic is 10Gbit, the other – 1Gbit. I know it is not a happy configuration, but it is as it is. So he has four paths to an LUN on active/active storage iSCSI (ALUA SATP) and is using the FIXED PSP (but the same problem would be with MRU PSP and it is not recommender to use the ROUND-ROBIN PSP). The challenge is to set the right (10 Gbit) path (vmk/vmnic) as preferred, so it always would be used if it is up, and the 1Gbit path would be used only if 10 Gbit is down. It is possible only with FIXED PSP, because here we have a failback, but not with the MRU PSP. The question is which one from the paths goes over 10 Gbit and whitch over 1 Gbit, to set the first one as preferred and make him to the path with activ (I/O) status. All four paths have the same iqn as initiator so it is not possible to know it using GUI. But it seems to be the same problem at the CLI. The way to look at esxtop to watch the vmk (and switch the preferred path if not the 10Gbit has the most traffic) isn’t the finest. Sorry about my English. Thank you. Best regards, Andreas Wedel.
In vSAN it looks like namespace objects don’t automatically expand when they hit the 255GB component size limit. Is there a way to manually grow the namespace object we are trying to transfer a few hundred GB of ISOs and Templates from legacy storage to vSAN?
I don’t believe so.
Is there a reason why you are not using a Content Library? This could sit on top of your vSAN datastore and can be used as your ISO repo. This is what I do in my own lab.
Hello Cormac,
Quick question on the mixing different vendor SSD in an all-flash cluster. I need to add capacity to a two-node all-flash VSAN cluster that’s been running for the past two years. The original 960GB SSDs are EOL, so I will need to source alternatives. Assuming that the ‘new’ 960GB SSD are on the VSAN HCL, is the configuration supported if I have a mix of SSDs from different vendors in the hosts? Is there anything I need to look out for?
Regards,
Anthony
To be best of my knowledge Anthony, support should not be an issue. This is a common scenario, so we have to support disk group with a mix and match of devices.
However you will need to ensure that the new devices are the same class of device as your existing devices. Other than that, its always good to check in with GSS in case they have any advice to offer you.
Hello,
Is this article for Proactive HA still relevant for vSAN 6.6 and 6.7?
https://cormachogan.com/2017/04/28/vsan-predictive-drs-network-aware-drs-proactive-ha/
Thank you,
Yes it is John.
Hi Cormac,
from some tests it would seem that 6.7 does a 4 component setup for objects with a PFTT=0 & SFTT=1 (stretched) with one data and one witness at each site (and one of the data components with 2 votes for a total of 5).
Would you please tell me why is that setup used instead of the usual 2 data + witness (i.e. only one witness) ?
Thanks!
I don’t understand why that would be the case – what happens if you also set site locality? Does it then show up as 2 x data + 1 x witness then?
Even without site locality, I would not expect to see multiple witnesses associated with the object. I assume it is a RAID-1, yes?
Yes, if you set site locality then a 2+1 is built. Yes, RAID-1. Yes, noteworthy. Amazed…
Please see https://communities.vmware.com/thread/602187
I don’t seem to have access to that thread for some reason.
That is VMTN VCI area 🙁 If you PM me I can copy the details,
Hi Cormac,
Love your blog and writings, learning a lot!
regard this article: https://cormachogan.com/2015/02/04/vsphere-6-0-storage-features-part-1-nfs-v4-1/#comments
I’m trying to understand if the risk exist also if you have two clusters:
cluster01: all its hosts mount export_for_vmware with NFSv3
cluster02: all its hosts mount export_for_vmware with NFSv41
Both clusters are Running different VM’s
Bottom line: Is the risk is per shared data or all the datastore?
Thanks!
Hi Eyal,
So I spoke to engineering on this, and the risk is per shared data. If there is no shared data across the cluster, then technically, you should be ok.
Now, be aware that even if you strive to deploy VMs and avoid sharing data, vSphere may still do some thing which break this. For example, if this datastore is picked for HA heartbeats, or distributed switches write their metadata here, or SIOC (if/when it gets supported on NFS v4.1), then you could run into problems.
Is there a use case for this configuration? Is there a reason why you would not go ahead and just use two distinct datastores? It would seem to be the less riskier approach.
Hi Cormac,
Thanks for your answer!
Actually there is no use case, I just wanted to understand better the risk of mixing the two NFS versions 🙂
Hey Cormac,
I’m interested in your blog and your writings, they brought alot stuff about kubernetes on vsphere!
One question I’ve struggling aroung with is the recommended / optimal size of a datastore.
Which initial size of datastore would you recommend and specify it in the storage class?
What, if a datastore is full. Would you provision a second datastore and create a new storage class with the new datastore specified? This approach would result in more administrative overhead.
Thanks and kind regards
I don’t have enough exposure to the operational side of K8s running on top of vSphere yet Lukas, so very difficult for me to say what is the best approach to take at this time. I’ll keep your question in mind as I meet more customers who are implementing K8s on vSphere.
In my environment VIC was created with proxy so when I am trying to push images from docker to harbor I am getting error saying received unexpected http status: 503 service unavailable and when I try to login to harbor It doesn’t throw authentication error instead it gives error :error response from deamon: login attemt to http://harbor ip failed with status 404 not found.
I am successfully able to push images from docker client to vIc harbor in proxy less envirionment, I am not sure what is being blocked
I have no idea. Probably best to open a support request for some assistance.
Hi Cormac, first of all thanks for all your posts. It seems from VMware support that there is a big problem in vmfs6 which causes random high stun times during snapshot creations and snapshot removals, this kind of problem is present even in vsphere 6.7 latest patch, at the moment the only workaround is to go back to vmfs5; this scenario applies to most storage vendors but some are mostly affected, we are using Dell Compellent storages and here the problem is evident. Are you aware of some troubles related to vmfs6 and stun ?
I’m not aware of any issue Alex, but I am not working deeply on VMFS so much these days. I would strongly recommend opening a Service Request with GSS (our support organization) to work on a solution.
Hi Cormac, this information is directly coming from VMware support, because we opened a ticket 3 months ago. Dell support is also involved, at the moment they informed us that Dell Engineering is trying to contat VMware Engineering to find a shared resolution.
Hello Cormac
Hope you’re doing well.
I read your article on Stripe Width (SW), which I understood; however my question is –
In a stable (Hybrid, FTT=1, FTM=RAID-1) environment, all writes and majority of the reads would be catered by the single disk in the Cache tier then how a SW of anything greater than 2 (first write going to DG-1 and mirrored write going to DG-2) can help?
Wouldn’t an increased SW ONLY come into picture during the destaging process? And will it really give me performance gains?
Question-2
Will the cache tier of both (Hybrid, FTT=1, FTM=RAID-1) the DGs holding the hot reads will be holding the same hot data?
I ask this because reads are catered by both Active and Passive replica, this would only make sense when cache tiers of 2 DGs are somewhat identical in terms of data they hold.
Does vSAN accumulate hot data from both active and passive replica in the cache tier?
Thanks
Varun
Hi,
Right now i am doing a lab on vSphere with Tanzu. When i try to prepare my cluster for workload domain, a error occur “Cluster domain-c38 is unhealthy: the server has asked for the client to provide credentials”. Also error occur in log file “Error occurred sending Principal Identity request to NSX: principal identity already created”. When i restart WCP service on vcenter, the NCP Plugin goes down in NSX-T Manager and this error “Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried” shows. BTW i am not doing my lab manually without VMware cloud builder.
Can you please help me in this.
Thanks,
Fahim
Hi Fahim, please speak to our support staff for assistance with this.
Hi,
Im trying to run sonobuoy on my K8s cluster.
The command – sonobuoy –kubeconfig /root/.kube/config images pull
Im getting the follwoings:
INFO[0000] Pulling image: sonobuoy/sonobuoy:v0.19.0 …
ERRO[0002] failed with following error after 1 retries:
ERRO[0002] Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Can you help please?
Did you read the docker link?
Hi Cormac,
Interested in Tanzu DR and what the best practices might be for this? I am researching DR to AWS via SRM or DRaaS / Zerto etc. Any thoughts or publications on this ?
This is still a work in progress Fintan – we’re looking at various options but nothing that I can discuss right now.
Hi Cormac,
Looking to know is there any blueprint for Tanzu DR ?
Looking into Cloud based options currently for on premises workloads and curious what is the strategy for Tanzu ?
Hi Cormac, I’ve successfully deployed Workload-Management on vSphere7 and your blog (https://cormachogan.com/2020/09/25/deploy-ha-proxy-for-vsphere-with-tanzu/) helps me a lot – thank’s for this!
At the deployment I defined the primary workload network and also 2 additional workload networks for namespace network isolation – works fine.
Now, after deployment, I need 2 more workload networks and I don’t know how to do.
Under vcenter/cluster/configure/namespaces I found my networks from the deployment – but there is no “Add Workload Network”-Button?!
Can you give me a hint?
Regards
Thomas Senkel / DATEV eG / Germany
Hi Thomas – I have only ever done this during the Workload Management deployment. You add additional networks when in the process of creating workload networks. I’m not aware of a way to go back and create additional networks after you have completed the workload management deployment. It sounds like a very reasonable feature to include however, so I will bring this to the attention of the product team.
Hello, my name is Nam from Vietnam, I saw your blog at this link: https://cormachogan.com/2019/06/18/kubernetes-storage-on-vsphere-101-failure-scenarios/
I want to ask you something, is there any way to create a pv using kubernetes yaml file with the multi-writer flag of vmware vmdk file? I want the vmware vmdk file still be able to attach to another node in case of running node crashes. thank you alot
Hi Nam,
There is no support for multi-writer mode / read-write-many for block volumes with the vSphere CSI driver. It is only available with file volumes e.g. vSAN File Service shares.
However, you should not need it for such a scenario. The PV should get removed from the failing node, and re-attached to the node where the Pod is scheduled / restarted.
In my failover test I shutdown vm suddenly and I see that dead vm is still holding the disk and pod is in terminating state forever so that disk can’t be attached to another node and new pod is in pending state, what should I do?
Please check https://github.com/kubernetes-sigs/vsphere-csi-driver/issues. Check if you are on latest vSphere CSI driver. If problems persists, open a new issue.
[Update]: https://vsphere-csi-driver.sigs.k8s.io/known_issues.html#multi-attach-error-for-rwo-block-volume-when-node-vm-is-shutdown-before-pods-are-evicted-and-volumes-are-detached-from-node-vm seems to describe it
Hi Cormac, we have a 8 Node vSAN-Cluster with vSAN-File-Service enabled. Now we want to reduce to 6 Hosts => is there any documentation how to remove ESXi-Host from a Cluster with vSAN-File-Service enabled? Because if I do the precheck for maintenance-mode with full data migration i got an error for the vSAN-File-Service Agent running on this host will become inaccessible. Regards and thanks a lot for an answer!
Hi Thomas – I don’t think you are able to do anything with the vSAN File Services nodes. These VMs are just a way to run the Protocol Stack containers, and to the best of my knowledge, these cannot be moved or migrated from the host that are deployed onto. The Protocol Stack container that runs inside of the vSAN File Services node VM will move to another vSAN File Services node VM on another ESXi host in the cluster during maintenance mode, but you cannot do anything with the actual VM itself. So what you are seeing is expected in my opinion.
I’m not sure if there is a procedure to reduce the number of nodes in a vSAN cluster when running vSAN File Services. I’m personally not aware of any way to do it, but it might be worthwhile getting in touch with support in case there is a procedure that I am not aware of. I’m afraid I haven’t been keeping up with vSAN updates too much these days.
Hi Cormac, thx for the quick answer!
Hello Cormac, I have a quick question. When setting up vSAN 7 with a witness appliance, must I set up the vmk1 for the vSAN traffic before creating a distributed switch for the 2-NODE vSAN cluster? So the correct order would be
1. Create 3 x ESXi hosts
2. Install vCenter on the 3rd host which will be the witness
3. install the witness appliance on that host
4. Create a vmk1 for vSAN in addition to the premade vmk0 for management traffic
5. Create the VSAN cluster and add the two vSAN hosts
6. Create a distributed switch will be applied to only the 2 vSAN hosts
7. Enable vSAN on that cluster with HA and DRS
8. Finish config and make sure I can SSH to hosts and vmkping all IPs from all vmks?
Any help welcome as I keep getting done by gotchas each time and get only a bit further
Thanks Cormac
Hi Steve,
Yes – I think that covers it. The only consideration is as follows. In my 2-node setup, I use VLAN 51 for management and VLAN 500 for vSAN. Thus the witness appliance, which is a VM, has 2 VM networks, one to VLAN 51 and the other to VLAN 500. So that ESXi host where the witness appliance is deployed used VM portgroups. For my physical 2 node cluster, they have 2 x VMKernel networks, again one for mgmt on VLAN 51 and the other for vSAN on VLAN 500.
Thanks for that Cormac. I have another quick question. I previously had these 3 servers using HP Storevirual vSAN. It was set up by an IT company. Now though, Storevirtual is end of life and I want to use them for VMware vSAN. The previous setup was that all three servers have 2 x 2-port fiber cards and they were all linked for vSAN traffic via 2 x Aruba fiber switches. The Ethernets were used for Management, VM Network and vMotion traffic. I have read recently that the servers can be linked using Direct Connect. Would that be possible via the fibers? So, each server connected to each other via the fibers directly and then those ports are configured for vSAN traffic etc? or would i still need to use the Aruba Switch to connect the fiber cards? Not sure if the Direct Connect only works with 10Gb Ethernet ports? If I don’t have to use the physical switch but can still use the 10Gb fiber that would be better and have less points of failure. As always any info welcome. Thanks Cormac
Direct Connect can certainly be used for 2-node deployments Steve. If you’re already doing vSphere networking through the fibers for management and vMotion, I don’t see why they won’t work with vSAN traffic for direct-connect. And yeah – 10Gb seems to be required, so you should be ok there.
Hi Cormac, thanks again. The management and vMotion and VM Networks were going through the 4 Ethernets on the servers. Just vSAN was going through the fibers via the Arubas. I did see that you need a cross-over maybe if you do Ethernet, with fiber though there aren’t any cross-overs are there, they all have receive and transmit fibers?
Ah ok – I would probably check with GSS/support then Steve. Its not something I could make a call on I’m afraid.
Thanks Cormac. Appreciated. What is GSS? VMware support
Hi Cormac,
is there any way to modify network permissions to Kubernetes PVs (backed by vSAN File Shares) for already existing PVCs without recreating them?
Regards and thanks a lot
Tom
It might be possible through editing some of the CSI configuration files, but I don’t know if it is supported. I would recommend opening a case with the GS organisation and asking their advice before changing anything.
Hi Cormac, thanks for your response!