vsan.resync_dashboard only reports VM resyncing, not templates

vsan-vmware-virtual-san-boxWhile doing some testing yesterday in our lab, we noticed that after we had placed a host participating in a VSAN cluster into maintenance mode and chose the option to evacuate the data from the host to the remaining nodes in the cluster, the “Enter Maintenance Mode” task was still sitting at 63% complete even though it seemed that the resynchronization of components was complete. For example, when we used the vsan.resync_dashboard RVC command, there were 0 bytes left to sync:

> vsan.resync_dashboard /localhost/ie-datacenter-01/computers/ie-vsan-01/
2014-11-06 12:07:45 +0000: Querying all VMs on VSAN ...
2014-11-06 12:07:45 +0000: Querying all objects .. from cs-ie-h01 ...
2014-11-06 12:07:45 +0000: Got all the info, computing table ...
+-----------+-----------------+---------------+
| VM/Object | Syncing objects | Bytes to sync |
+-----------+-----------------+---------------+
+-----------+-----------------+---------------+
| Total     | 0               | 0.00 GB       |
+-----------+-----------------+---------------+

Hmm. This was a bit strange, so we decided to check whether all of the components had been migrated off of the host that we placed in maintenance mode, in this case host cs-ie-h01.

We used the vsan.disks_stats command to check:

> vsan.disks_stats ie-vsan-01
2014-11-06 12:12:22 +0000: Fetching VSAN disk info from cs-ie-h04 ...
2014-11-06 12:12:22 +0000: Fetching VSAN disk info from cs-ie-h02 ...
2014-11-06 12:12:22 +0000: Fetching VSAN disk info from cs-ie-h01 ...
2014-11-06 12:12:22 +0000: Fetching VSAN disk info from cs-ie-h03 ...
2014-11-06 12:12:24 +0000: Done fetching VSAN disk infos
+--------------------------------------+-----------+-------+------+
|                                      |           |       | Num  | 
| DisplayName                          | Host      | isSSD | Comp | 
+--------------------------------------+-----------+-------+------+
| eui.48f8681115d6416c00247172ce4df168 | cs-ie-h01 | SSD   | 0    | 
| naa.600508b1001c79748e8465571b6f4a46 | cs-ie-h01 | MD    | 1    | 
| naa.600508b1001c2ee9a6446e708105054b | cs-ie-h01 | MD    | 0    | 
| naa.600508b1001c388c92e817e43fcd5237 | cs-ie-h01 | MD    | 0    | 
| naa.600508b1001ccd5d506e7ed19c40a64c | cs-ie-h01 | MD    | 0    | 
| naa.600508b1001c3ea7838c0436dbe6d7a2 | cs-ie-h01 | MD    | 0    | 
| naa.600508b1001c16be6e256767284eaf88 | cs-ie-h01 | MD    | 0    | 
| naa.600508b1001c64816271482a56a48c3c | cs-ie-h01 | MD    | 0    | 
+--------------------------------------+-----------+-------+------+
<>
+--------------------------------------+-----------+-------+------+
-----------+------+----------+--------+
 Capacity  |      |          | Status |
 Total     | Used | Reserved | Health |
-----------+------+----------+--------+
 785.57 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 16 % | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
 136.50 GB | 0 %  | 0 %      | OK     |
-----------+------+----------+--------+

The magnetic disk (MD) with a NAA ID of 600508b1001c79748e8465571b6f4a46still has one component on it and is 16% Used. I used this next command to figure out what was on that disk:

> vsan.disk_object_info naa.600508b1001c79748e8465571b6f4a46 
Physical disk naa.600508b1001c79748e8465571b6f4a46 
  (52191bcb-7ea5-95ff-78af-2b14f72d95e4):
  DOM Object: 8e802154-7ccc-2191-0b4e-001517a69c72 
  (owner: cs-ie-h03.ie.local, policy: hostFailuresToTolerate = 1)
    Context: Can't attribute object to any VM, may be swap?
    Witness: 5f635b54-04f5-8c08-e5f6-0010185def78 
(state: ACTIVE (5), 
      host: cs-ie-h03.ie.local, 
      md: naa.600508b1001ceefc4213ceb9b51c4be4, 
      ssd: eui.d1ef5a5bbe864e27002471febdec3592, 
      usage: 0.0 GB)
    Witness: 5f635b54-a4d2-8a08-9efc-0010185def78 
(state: ACTIVE (5), 
      host: cs-ie-h02.ie.local, 
      md: naa.600508b1001c19335174d82278dee603, 
      ssd: eui.c68e151fed8a4fcf0024712c7cc444fe, 
      usage: 0.0 GB)
    RAID_1
      Component: 5f635b54-14f7-8608-a05a-0010185def78 
         (state: RECONFIGURING (10), 
         host: cs-ie-h04.ie.local, 
         md: naa.600508b1001c4b820b4d80f9f8acfa95, 
         ssd: eui.a15eb52c6f4043b5002471c7886acfaa, 
         dataToSync: 1.79 GB, 
         usage: 21.5 GB)
      Component: ad153854-c4c4-c4d8-b7e0-001f29595f9f 
         (state: ACTIVE (5), 
         host: cs-ie-h03.ie.local, 
         md: naa.600508b1001ceefc4213ceb9b51c4be4, 
         ssd: eui.d1ef5a5bbe864e27002471febdec3592, 
         usage: 21.5 GB)
      Component: 8e802154-cc97-ffc1-4c85-001517a69c72 
         (state: ACTIVE (5), 
         host: cs-ie-h01.ie.local, 
         md: **naa.600508b1001c79748e8465571b6f4a46**, 
         ssd: eui.48f8681115d6416c00247172ce4df168, 
         usage: 21.5 GB)

From this command I can clearly see that there is a component in a state of RECONFIGURING and that there is still 1.79GB worth of data to sync. This is why my host has not yet entered maintenance mode.

So it begs the question, why does this not show up in the vsan.resync_dashboard? The reason is that this is not a virtual machine object; rather it is a template. I used one final command to display that by using the DOM object reference from the previous command:

> vsan.object_info /localhost/ie-datacenter-01/computers/ie-vsan-01/ 
8e802154-7ccc-2191-0b4e-001517a69c72
DOM Object: 8e802154-7ccc-2191-0b4e-001517a69c72 
(owner: cs-ie-h03.ie.local, policy: hostFailuresToTolerate = 1)
  Witness: 3f675b54-3043-9fc5-bdb9-0010185def78 
    (state: ACTIVE (5), 
    host: cs-ie-h02.ie.local, 
    md: naa.600508b1001c19335174d82278dee603, 
    ssd: eui.c68e151fed8a4fcf0024712c7cc444fe, 
    usage: 0.0 GB)
  RAID_1
    Component: 5f635b54-14f7-8608-a05a-0010185def78 
       (state: ACTIVE (5), 
       host: cs-ie-h04.ie.local, 
       md: naa.600508b1001c4b820b4d80f9f8acfa95, 
       ssd: eui.a15eb52c6f4043b5002471c7886acfaa, 
       usage: 21.5 GB)
    Component: ad153854-c4c4-c4d8-b7e0-001f29595f9f 
       (state: ACTIVE (5), 
       host: cs-ie-h03.ie.local, 
       md: naa.600508b1001ceefc4213ceb9b51c4be4, 
       ssd: eui.d1ef5a5bbe864e27002471febdec3592, 
       usage: 21.5 GB)
  Extended attributes:
    Address space: 53687091200B (50.00 GB)
    Object class: vdisk
    Object path: /vmfs/volumes/vsan:52dc5a95d04bcbb9-9d90f486c2f14d1d/
             89802154-5c14-e025-333b-001517a69c72/ie-ora-01-clone.vmdk

Once I saw the object path reference in the last line of this output, I knew that this was one of my templates and not an actual VM.

Therefore, if you enter maintenance mode, and select to evacuate the data from the host in question, even though vsan.resync_dashboard might report that there are 0 bytes left to sync, other objects, such as VM templates, might still be reconfiguring. The above commands should help you to determine if this is indeed the case.

We are getting a KB article created to help make this behaviour a little easier to understand should you encounter it. I’ll also provide an update when we have a solution to this issue so that the resync dashboard command also reports on template migrations, and not just VMs.