Where to look when troubleshooting VDPA backup issues
I had the pleasure (?) recently of troubleshooting some backup issues on my vSphere Data Protection Advanced (VDPA) setup. To be honest, I had not spent a great deal of time on this product recently, other than a few simple backup and restores. However, in my new role I now have a number of other projects which requires me to understand this product’s functionality a bit more. When things were not going right for me though, I spent a lot of time searching for some log files which might give me some clue as to the nature of my problem. After some assistance from some of the GSS guys based in Cork, we narrowed it down.
For me, the install and configure part went fine. I could also create backups with relative ease. The issue related to running the backups in certain environments, which were failing. So how then could I determine why this was happening?
The clue is the name of the backup job. This forms part of the log file used to capture the interactions between VDPA and vCenter server. This log file can be found in /space/avamarclient on the VDPA appliance. If you list the contents of this directory, you will find that there are a number of var-proxy-x folders. In each of these folders, you will find the various log files for the vCenter/VDPA interactions. Now, it is not easy to determine which var-proxy-x folder a particular backup job’s logs will be placed, so you will have to search for the job name and the time stamp in each. The log file will be <job-name><identifier>-vmimagew.log. From this log file, you will be able to find the entries related to vCenter/VDPA communication, authentication of user ids, virtual machine info (the VM which is being backed up), the various transport modes, snapshot creation and removal and so on. This should be your main place to look for issues when backup jobs are failing.
As an FYI, here are some snippets from one such backup job when things are working correctly.
Log start header:
---------------------------------------------------------------------- START avvcbimage log 2014-02-27 05:36:23 EST -----------------------------------------------------------------------
vCenter login and credentials verified:
Login(https://mia-cg07-vc01:443/sdk) Datacenter: 'mia-cg07-dc01' - connected to 'VirtualCenter' - version: 'VMware vCenter Server 5.5.0 Login test is OK.
Virtual Machine information discovered:
vCenter 'mia-cg07-dc01' is mia-cg07-vc01 VM's host is mia-cg07-esx011.vmwcs.com VM's primary storage location is [datastore1] vm/vm.vmx VM's vCenter ID is vm-102
Virtual Machine storage and file information discovered:
The VM '[datastore1] vm/vm.vmx' has no old snapshots to delete. no snapshots are present on this VM.Logging into datacenter path ... All Datastore Info: datastore:'datastore1 ' capacity=2684354560 free=984612864 datastore:'mia_vnx01_cg07_01 ' capacity=528678895616 free=528014123008 VM Info (1/1): VM:'vm/ ' last modified on '27-Feb-2014 10:36' All VM file Info: File:'vm-ctk.vmdk ' last modified on '27-Feb-2014 10:36' size=66048 File:'vm-flat.vmdk ' last modified on '27-Feb-2014 10:31' size=1073741824 File:'vm.vmdk ' last modified on '27-Feb-2014 10:36' size=528 File:'vm.vmsd ' last modified on '27-Feb-2014 10:31' size=0 File:'vm.vmx ' last modified on '27-Feb-2014 10:36' size=2226 File:'vm.vmxf ' last modified on '27-Feb-2014 10:31' size=263 . . Found 1 disk(s), 0 snapshots, and 0 snapshot files, on the VMs datastor
Run a pre-snapshot script if it exists:
vmAction runBackupScript: () vmAction runBackupScript: script is skipped because it is null [IMG0009] Pre-snapshot script: completed successfully
Snapshot request:
a VM snapshot has been requested Creating snapshot 'VDP-xxxxxxxx', quieceFS=1 Snapshot 'VDP-xxxxxxxxx' creation for VM '[datastore1] local-vm/local-vm.vmx' task still in progress, sleep for 2 secSnapshot 'VDP-xxxxxxxxxx' creation for VM '[datastore1] local-vm/local-vm.vmx' task completed, moref=snapshot-105
Post snapshot script, if it exists, and then some disk and snapshot information:
[IMG0009] Post-snapshot script: completed successfully New method from Snapshot/config: Using '[datastore1] vm/vm.vmdk'. Disk Info: Disk '2000': file(base):'[datastore1] vm/vm.vmdk', backItUp=1 snapshot(base) file:'[datastore1] vm/vm.vmdk' snapshot(curr) file:'[datastore1] vm/vm-000001.vmdk' prior size(KB):0, current size(KB):1048576, match=0 prior change block ID: '' current change block ID:'52 40 59 09 58 fe e9 41-0e a1 8e 5f 93 4d ..' CBTenabled=1, thin=0, independent=0, SCSI=1, diskMode=persistent, backing=DiskFlatVer2..snapshot created:true NOMC:false ChangeBlTrackingAvail:true UsingChBl:true
Is hot-add in use or not? If not, why not?
Connected with nbdssl transport to virtual disk [datastore1] vm/vm.vmdk hotadd transport is recommended for faster backups and restores and less exposure to network routing, firewall and SSL certificate issues. One or more of these items may be preventing use of hotadd transport with this VM: Check that the backup proxy VM(127.0.0.2,vsanvdp.vmwcs.com,vsanvdp) is deployed on an ESX(i) host that has a path to the storage holding the virtual disk being backed up ([datastore1] vm/vm.vmdk). Also check that the proxy is configured to protect this datastore Check that the vmfs holding the backup proxy VM has a total capacity equal to the size of the largest virtual disk being backed up. (Capacity merely has to be present - Free space can be less than this ..) Check that the vmfs holding the backup proxy VM is formatted with a vmfs block size which supports the size of the largest virtual disk being backed up Check that the VM being backed up has no virtual hard disks designated as independent Check that the VM being backed up has no IDE virtual hard disks, usually an artifact of a P2v conversion - use SCSI virtual disks instead Check that the VM being backed up is in the same datacenter as the backup proxy VM. Hotadd transport cannot cross a Datacenter boundary. Check that no VMs, and no vmdk's within the vCenter have the same name as those associated with the VM being backed up. (unusual but possible) Connecting virtual disk [datastore1] vm/vm.vmdk
Do the backup:
Snapshot reported sector count 2,097,152 = 1.00 GB Adding file 'attributes.xml' (907 bytes) Adding file 'virtdisk-descriptor.vmdk' (408 bytes) Virt Disk extent (flat vmdk) size is 1,073,741,824 In-use block offsets for disk 2000: not present changed-block or in-use lists are unavailable, performing full level zero backup VDDK:VixDiskLib: VixDiskLib_Close: Close disk.
Backup completed, delete the snapshot:
Deleting the snapshot 'VDP-xxxxxxx', moref 'snapshot-105'Process 24829 (/usr/local/avamarclient/bin/avtar) finished (code 0: success)
And finally the statistics of the backup job:
Wrap-Up Stats: --------------------------------------------------------- Disk Number 2000: Completed(MB) Perf(MB/min) Duration VDDK-IO: 1024.00 1706.7 00:00:36 Output-IO: 1024.00 61440.0 00:00:01 Total-IO: 1024.00 1498.5 00:00:41 (+00:00:01 idle/loading/preparing) --------------------------------------------------------- Aggregate: Completed(MB) Perf(MB/min) Duration VDDK-IO: 1024.00 1706.7 00:00:36 Output-IO: 1024.00 61440.0 00:00:01 Total-IO: 1024.00 1498.5 00:00:41 ---------------------------------------------------------
And the log completes with an END statement, along with warnings, errors and fatal errors:
----------------------------------------------------------------------- END avvcbimage log 2014-02-27 05:37:49 EST (8 warnings, 0 errors, 0 fatal errors) -----------------------------------------------------------------------
This log file is a definite treasure trove of information when it comes to troubleshooting VDPA backup jobs. Kudos to my GSS pals in Cork for this info.
Is this only for Advanced?
Answered my own question! This does in fact work for the standard version. Thanks for the great post!
Great – I like those sorts of questions 🙂