Pretty soon I’ll be heading out on the road to talk at various VMUGs about our first 6 months with VSAN, VMware’s Virtual SAN product. Regular readers will need no introduction to VSAN, and as was mentioned at VMworld this year, we’re gearing up for our next major release. With that in mind, I thought it might be useful to go back over the last 6 months, with a look at some successes, some design decisions you might have to make, what are the available troubleshooting tools, some common gotchas (all those things that will help you have a successful Proof of Concept – POC – with VSAN) and then a quick view at some futures.
While doing some testing yesterday in our lab, we noticed that after we had placed a host participating in a VSAN cluster into maintenance mode and chose the option to evacuate the data from the host to the remaining nodes in the cluster, the “Enter Maintenance Mode” task was still sitting at 63% complete even though it seemed that the resynchronization of components was complete. For example, when we used the vsan.resync_dashboard RVC command, there were 0 bytes left to sync:
> vsan.resync_dashboard /localhost/ie-datacenter-01/computers/ie-vsan-01/ 2014-11-06 12:07:45 +0000: Querying all VMs on VSAN ... 2014-11-06 12:07:45 +0000: Querying all objects .. from cs-ie-h01 ... 2014-11-06 12:07:45 +0000: Got all the info, computing table ... +-----------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +-----------+-----------------+---------------+ +-----------+-----------------+---------------+ | Total | 0 | 0.00 GB | +-----------+-----------------+---------------+
Hmm. This was a bit strange, so we decided to check whether all of the components had been migrated off of the host that we placed in maintenance mode, in this case host cs-ie-h01.
There has been a bit of confusion recently over the use of OEM ESXi ISO images and Virtual SAN. These OEM ESXi ISO images allow our partners to pre-package a bunch of their own drivers and software components so that you have them available to you immediately on install. While this can be very beneficial for non-VSAN environments, it is not quite so straight-forward for VSAN deployments. Drivers associated with VSAN have to go through extra testing for some very good reasons that I will allude to shortly. The issue really pertains to the drivers that are shipped with many of these ESXi images; in many cases these are the latest and greatest drivers from the OEM for a given storage controller and may not yet be qualified for VSAN (qualified == tested).
I’ve been fortunate enough to receive a bunch of invites to present at various VMware User Group (VMUG) meetings around Europe next month. This year I’ll be presenting a “Virtual SAN (VSAN) troubleshooting and gotchas” type session, so anyone with an interest in VSAN or EVO:RAIL should find this useful. So where will you find me?
Whilst at VMworld 2014, I had the opportunity to catch up with the Nexenta team who have been working on a very interesting project with VMware’s Virtual SAN (VSAN). The Nexenta Connect for VSAN product, running on top of VSAN, is designed to provide file services, which allows VSAN to not only store your virtual machines, but also to provide SMB and NFS shares for those virtual machines. I caught up with Michael Letschin and Gijsbert Janssen van Doorn of the Nexenta team to learn more and get a tech preview of the product.
A quick note to let you know about a new KB article that has recently been published which reports incorrect values for Outstanding IO in the VSAN Observer tool used for monitoring performance of VSAN deployments when using vSphere 5.5U2.
KB 2091979 reports the issue as follows:
Virtual SAN (VSAN) Observer graphs in the “VSAN Client”, “VSAN Disk”, “DOM Owner” or individual VSAN object on the “VM” tab show very high Outstanding I/O (OIO) value that is inconsistent with the actual I/O load.
Here is a sample screenshot from my VSAN environment running vSphere 5.5U2. As you can see the Outstanding IO values are off the scale:
Of course, this behaviour may lead to you “chasing your tail” so to speak when monitoring or troubleshooting VSAN, so we are working on getting this resolved asap. Check the KB article regularly for updates regarding a fix. In the meantime, understand that a high Outstanding IO count in VSAN Observer is expected and may not be the symptom of any underlying issue.
There was a very interesting discussion on our internal forums here at VMware over the past week. One of our guys had built out a VSAN cluster, and everything looked good. However on attempting to deploy a virtual machine on the VSAN datastore, he kept hitting an error which reported that it “cannot complete file creation operation”. As I said, everything looked healthy. The cluster formed correctly, there were no network partitions and the network status was normal. So what could be the problem?