In a previous post, I discussed the difference between a component that is marked as ABSENT, and a component that is marked as DEGRADED. In this post, I’m going to take this up a level and talk about objects, and how failures in the cluster can change the status of objects. In VSAN, and object is made up of one or more components, so for instance if you have a VM that you wish to have tolerate a number of failures, or indeed you wish to stripe a VMDK across multiple disks, then you will certainly have multiple components making up the VSAN object. Read this article for a better understanding of object and components. Object compliance status and object operation status are two distinct states that an object may have. Let’s look at them in more detail next.
As part of a quick reference proof-of-concept/evaluation guide that I have been working on, it has become very clear that one of the areas that causes the most confusion is what happens when a storage device is either manually removed from a host participating in the Virtual SAN cluster or the device suffers a failure. These are not the same thing from a Virtual SAN perspective.
To explain the different behaviour, it is important to understand that Virtual SAN has 2 types of failure states for components: ABSENT and DEGRADED.
The folks from the Nordic VMUG team in Denmark were kind enough to record my Virtual SAN (VSAN) session in Copenhagen last week. If you are interested in VSAN, or considering a VSAN evaluation or Proof of Concept, then this might be worth watching. In it, I cover design considerations, troubleshooting tools, monitoring and performance, as well as some common gotchas. I close with some VSAN futures. Enjoy!
Pretty soon I’ll be heading out on the road to talk at various VMUGs about our first 6 months with VSAN, VMware’s Virtual SAN product. Regular readers will need no introduction to VSAN, and as was mentioned at VMworld this year, we’re gearing up for our next major release. With that in mind, I thought it might be useful to go back over the last 6 months, with a look at some successes, some design decisions you might have to make, what are the available troubleshooting tools, some common gotchas (all those things that will help you have a successful Proof of Concept – POC – with VSAN) and then a quick view at some futures.
While doing some testing yesterday in our lab, we noticed that after we had placed a host participating in a VSAN cluster into maintenance mode and chose the option to evacuate the data from the host to the remaining nodes in the cluster, the “Enter Maintenance Mode” task was still sitting at 63% complete even though it seemed that the resynchronization of components was complete. For example, when we used the vsan.resync_dashboard RVC command, there were 0 bytes left to sync:
> vsan.resync_dashboard /localhost/ie-datacenter-01/computers/ie-vsan-01/ 2014-11-06 12:07:45 +0000: Querying all VMs on VSAN ... 2014-11-06 12:07:45 +0000: Querying all objects .. from cs-ie-h01 ... 2014-11-06 12:07:45 +0000: Got all the info, computing table ... +-----------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +-----------+-----------------+---------------+ +-----------+-----------------+---------------+ | Total | 0 | 0.00 GB | +-----------+-----------------+---------------+
Hmm. This was a bit strange, so we decided to check whether all of the components had been migrated off of the host that we placed in maintenance mode, in this case host cs-ie-h01.
There has been a bit of confusion recently over the use of OEM ESXi ISO images and Virtual SAN. These OEM ESXi ISO images allow our partners to pre-package a bunch of their own drivers and software components so that you have them available to you immediately on install. While this can be very beneficial for non-VSAN environments, it is not quite so straight-forward for VSAN deployments. Drivers associated with VSAN have to go through extra testing for some very good reasons that I will allude to shortly. The issue really pertains to the drivers that are shipped with many of these ESXi images; in many cases these are the latest and greatest drivers from the OEM for a given storage controller and may not yet be qualified for VSAN (qualified == tested).
I’ve been fortunate enough to receive a bunch of invites to present at various VMware User Group (VMUG) meetings around Europe next month. This year I’ll be presenting a “Virtual SAN (VSAN) troubleshooting and gotchas” type session, so anyone with an interest in VSAN or EVO:RAIL should find this useful. So where will you find me?