Where to look when troubleshooting VDPA backup issues

vdpaI had the pleasure (?) recently of troubleshooting some backup issues on my vSphere Data Protection Advanced (VDPA) setup. To be honest, I had not spent a great deal of time on this product recently, other than a few simple backup and restores. However, in my new role I now have a number of other projects which requires me to understand this product’s functionality a bit more. When things were not going right for me though, I spent a lot of time searching for some log files which might give me some clue as to the nature of my problem. After some assistance from some of the GSS guys based in Cork, we narrowed it down.

For me, the install and configure part went fine. I could also create backups with relative ease. The issue related to running the backups in certain environments, which were failing. So how then could I determine why this was happening?

The clue is the name of the backup job. This forms part of the log file used to capture the interactions between VDPA and vCenter server. This log file can be found in /space/avamarclient on the VDPA appliance. If you list the contents of this directory, you will find that there are a number of var-proxy-x folders. In each of these folders, you will find the various log files for the vCenter/VDPA interactions. Now, it is not easy to determine which var-proxy-x folder a particular backup job’s logs will be placed, so you will have to search for the job name and the time stamp in each. The log file will be <job-name><identifier>-vmimagew.log. From this log file, you will be able to find the entries related to vCenter/VDPA communication, authentication of user ids, virtual machine info (the VM which is being backed up), the various transport modes, snapshot creation and removal and so on. This should be your main place to look for issues when backup jobs are failing.

As an FYI, here are some snippets from one such backup job when things are working correctly.

Log start header:

          START avvcbimage log 2014-02-27 05:36:23 EST  

vCenter login and credentials verified:

Login(https://mia-cg07-vc01:443/sdk) Datacenter: 'mia-cg07-dc01' 
- connected to 'VirtualCenter' - version: 'VMware vCenter Server 5.5.0
Login test is OK.

Virtual Machine information discovered:

vCenter 'mia-cg07-dc01' is mia-cg07-vc01
VM's host is mia-cg07-esx011.vmwcs.com
VM's primary storage location is [datastore1] vm/vm.vmx
VM's vCenter ID is vm-102

Virtual Machine storage and file information discovered:

The VM '[datastore1] vm/vm.vmx' has no old snapshots to delete.
no snapshots are present on this VM.Logging into datacenter path ...
All Datastore Info:  
datastore:'datastore1        ' capacity=2684354560    free=984612864  
datastore:'mia_vnx01_cg07_01 ' capacity=528678895616  free=528014123008
VM Info (1/1):  
VM:'vm/               '  last modified on '27-Feb-2014 10:36'
All VM file Info:  
File:'vm-ctk.vmdk   ' last modified on '27-Feb-2014 10:36' size=66048  
File:'vm-flat.vmdk  ' last modified on '27-Feb-2014 10:31' size=1073741824  
File:'vm.vmdk       ' last modified on '27-Feb-2014 10:36' size=528  
File:'vm.vmsd       ' last modified on '27-Feb-2014 10:31' size=0  
File:'vm.vmx        ' last modified on '27-Feb-2014 10:36' size=2226  
File:'vm.vmxf       ' last modified on '27-Feb-2014 10:31' size=263
Found 1 disk(s), 0 snapshots, and 0 snapshot files, on the VMs datastor

Run a pre-snapshot script if it exists:

vmAction runBackupScript: ()
vmAction runBackupScript: script is skipped because it is null
[IMG0009] Pre-snapshot script:  completed successfully

Snapshot request:

a VM snapshot has been requested
Creating snapshot 'VDP-xxxxxxxx',
Snapshot 'VDP-xxxxxxxxx'
    creation for VM '[datastore1] local-vm/local-vm.vmx' task still in
    progress, sleep for 2 secSnapshot 'VDP-xxxxxxxxxx'
    creation for VM '[datastore1] local-vm/local-vm.vmx' task completed,

Post snapshot script, if it exists, and then some disk and snapshot information:

[IMG0009] Post-snapshot script:  completed successfully
New method from Snapshot/config: Using '[datastore1] vm/vm.vmdk'.
Disk Info: 
Disk '2000': file(base):'[datastore1] vm/vm.vmdk', backItUp=1
 snapshot(base) file:'[datastore1] vm/vm.vmdk'
 snapshot(curr) file:'[datastore1] vm/vm-000001.vmdk'
 prior size(KB):0, current size(KB):1048576, match=0
 prior change block ID:  ''
 current change block ID:'52 40 59 09 58 fe e9 41-0e a1 8e 5f 93 4d ..'
 CBTenabled=1, thin=0, independent=0, SCSI=1, diskMode=persistent, 
backing=DiskFlatVer2..snapshot created:true NOMC:false 
ChangeBlTrackingAvail:true UsingChBl:true

Is hot-add in use or not? If not, why not?

Connected with nbdssl transport to virtual disk [datastore1] vm/vm.vmdk
hotadd transport is recommended for faster backups and restores and less
exposure to network routing, firewall and SSL certificate issues.
One or more of these items may be preventing use of hotadd transport with
 this VM: 
Check that the backup proxy VM(,vsanvdp.vmwcs.com,vsanvdp)
 is deployed on an ESX(i) host that has a path to the storage holding 
 the virtual disk being backed up ([datastore1] vm/vm.vmdk). 
Also check that the proxy is configured to protect this datastore 
Check that the vmfs holding the backup proxy VM has a total capacity 
 equal to the size of the largest virtual disk being backed up.
(Capacity merely has to be present - Free space can be less than this ..) 
Check that the vmfs holding the backup proxy VM is formatted with a vmfs
 block size which supports the size of the largest virtual disk being 
 backed up 
Check that the VM being backed up has no virtual hard disks designated 
 as independent 
Check that the VM being backed up has no IDE virtual hard disks, usually
 an artifact of a P2v conversion - use SCSI virtual disks instead 
Check that the VM being backed up is in the same datacenter as the backup
 proxy VM. Hotadd transport cannot cross a Datacenter boundary. 
Check that no VMs, and no vmdk's within the vCenter have the same name
 as those associated with the VM being backed up. (unusual but possible)
Connecting virtual disk [datastore1] vm/vm.vmdk

Do the backup:

Snapshot reported sector count 2,097,152 = 1.00 GB
Adding file 'attributes.xml' (907 bytes)
Adding file 'virtdisk-descriptor.vmdk' (408 bytes)
Virt Disk extent (flat vmdk) size is 1,073,741,824
In-use block offsets for disk 2000: not present
changed-block or in-use lists are unavailable, performing full level 
  zero backup
VDDK:VixDiskLib: VixDiskLib_Close: Close disk.

Backup completed, delete the snapshot:

Deleting the snapshot 'VDP-xxxxxxx', 
moref 'snapshot-105'Process 24829 (/usr/local/avamarclient/bin/avtar) 
finished (code 0: success)

And finally the statistics of the backup job:

Wrap-Up Stats:
Disk Number 2000:                    
             Completed(MB)  Perf(MB/min)  Duration       
VDDK-IO:     1024.00        1706.7        00:00:36       
Output-IO:   1024.00        61440.0       00:00:01       
Total-IO:    1024.00        1498.5        00:00:41 (+00:00:01 idle/loading/preparing)
              Completed(MB)  Perf(MB/min)  Duration       
VDDK-IO:     1024.00        1706.7        00:00:36       
Output-IO:   1024.00        61440.0       00:00:01       
Total-IO:    1024.00        1498.5        00:00:41

And the log completes with an END statement, along with warnings, errors and fatal errors:

END avvcbimage log 2014-02-27 05:37:49 EST  (8 warnings, 0 errors, 
0 fatal errors)

This log file is a definite treasure trove of information when it comes to troubleshooting VDPA backup jobs. Kudos to my GSS pals in Cork for this info.


Comments are closed.