There is a new snapshot format introduced in VSAN 6.0 called vsanSparse. These replace the traditional vmfsSparse format (redo logs). The vmfsSparse format was used when snapshots of VMs were taken in VSAN 5.5, and are also the format used when a snapshot is taken of a VM residing on traditional VMFS and NFS. The older vmfsSparse format left a lot to be desired when it came to performance and scalability. This KB article from our support team, indicating that no snapshot should be used for more than 72 hours, and snapshot chains should contain no more than 2-3 snapshots, speaks for itself.
Recently I published an article on Virtual Volumes (VVols) where I touched on a comparison between how migrations typically worked with VAAI and how they now work with VVols. In the meantime, I managed to have some really interesting discussions with some of our VVol leads, and I thought it worth sharing here as I haven’t seen this level of detail anywhere else. This is rather a long discussion, as there are a lot of different permutations of migrations that can take place. There are also different states that the virtual machine could be in. We’re solely focused on VVols here, so although different scenarios are offered up, I highlight what scenario we are actually considering.
I have been doing a bunch of stuff around disaster recovery (DR) recently, and my storage of choice at both the production site and the recovery site has been VSAN, VMware Virtual SAN. I have already done a number of tests already with products like vCenter Server, vCenter Operations Manager and NSX, our network virtualization product. Next up was VCO, our vCenter Orchestrator product. I set up vSphere Replication for my vCO servers (I deployed them in a HA configuration) and their associated SQL DB VM on Friday, but when I got in Monday morning, I could not log onto my vCenter. The problem was that my vCenter was running on VSAN (a bit of a chicken and egg type situation), so how do I troubleshoot this situation without my vCenter. And what was the actual problem? Was it a VSAN issue? This is what had to be done to resolve it.
I’ve been working on some Disaster Recovery (DR) scenarios recently with my good pal Paudie. Last month we looked at how we might be able to protect vCenter Operation Manager, by using a vApp construct and also using IP customization. After VMworld, we turned our attention to NSX, and how we might be able to implement a DR solution for NSX. This is still a work in progress, but we did learn some very useful NSX troubleshooting commands that I thought would be worth sharing with you.
I was involved in some conversations recently on how the VAAI UNMAP command behaved, and what were the characteristics which affected its performance. For those of you who do not know, UNMAP is our mechanism for reclaiming dead or stranded space from thinly provisioned VMFS volumes. Prior to this capability, the ESXi host had no way of informing the storage array that the space that was being previously consumed by a particular VM or file is no longer in use. This meant that the array thought that more space was being consumed than was actually the case. UNMAP, part of the vSphere APIs for Array Integration, enables administrators to overcome tho challenge by telling the array that these blocks on a thin provisioned volume are no longer in use and that they can be reclaimed.
I had the pleasure (?) recently of troubleshooting some backup issues on my vSphere Data Protection Advanced (VDPA) setup. To be honest, I had not spent a great deal of time on this product recently, other than a few simple backup and restores. However, in my new role I now have a number of other projects which requires me to understand this product’s functionality a bit more. When things were not going right for me though, I spent a lot of time searching for some log files which might give me some clue as to the nature of my problem. After some assistance from some of the GSS guys based in Cork, we narrowed it down.
For me, the install and configure part went fine. I could also create backups with relative ease. The issue related to running the backups in certain environments, which were failing. So how then could I determine why this was happening?
There are many occasions where the information displayed in the vSphere client is not sufficient to display all relevant information about a particular storage device, or indeed to troubleshoot problems related to a storage device. The purpose of this post is to explain some of the most often used ESXCLI commands that I use when trying to determine storage device information, and to troubleshoot a particular device.