vSAN 6.6 Config Assist incorrectly reports Physical NIC warning with LACP/LAG

A very short post simply to bring an issue to your attention which a number of folks have pinged me about this week. With vSAN 6.6, there is a new feature called Configuration Assistant. As the name implies, it tries to highlight possible configuration issues with your vSAN infrastructure. A number of these checks are related to network configuration. Configuration Assistant checks to make sure that the vSAN network has availability by verifying that there are 2 or more physical NICs. For example, let me show you my setup. Here is my vSAN vmkernel port, and as you can see, it has two physical NICs/two uplinks (vmnic0 and vmnic1):

Now if I check the Configuration Assistant – Physical NICs test, I see that it is all green/passes the test:

OK – that works fine. However, we made a little oversight when it comes to LACP/Link Aggregation and the use of LAGs (Link Aggregation Groups). A LAG can contain multiple physical NICs. However, the LAG group then gets added/associated with a single uplink (e.g. vmnic0). Now when the vSAN vmkernel port is using a LAG, it is only using one uplink, but within that single uplink, the LAG can contain multiple physical NICs. Now the Configuration Assistant will report a warning “Portgroup only has 1 active physical NIC”, as shown in the following screenshot:

This is obviously incorrect, and the test is looking at uplinks rather than physical NICs. The bottom line is that we are aware of the issue, and if you have a LAG configured for the vSAN network, you can safely ignore this warning in Configuration Assistant. We are working on a solution.

Before finishing, I’d like to take an opportunity to point you to our new vSAN Network Design Guide, which has lots of good information around LACP, LAG and many other network considerations for vSAN. Please check it out. You can download it as a PDF for your reference. Feedback always welcomed.

5 comments
  1. Cormac, although not related to this, I deployed my first vSAN 6.6 2-node cluster out of 75 that I need to deploy. The Configuration Assistant as well as the Health Check reports Network Configuration warnings for MTU large packet test between the nodes and the witness. Sometimes is node 1 to witness, other times is node 2 to witness, other times witness to node 1, etc. Every component, physical and virtual in both sides is set for 9000 MTU. I changed everything to 1500 to test and the results are the same for the large packet test.

    Could it be also an issue with 6.6 reporting warnings erroneously as same as the issue you described in this article? Is VMware aware of other similar cases?

    Thanks,

    • Is it the vSAN check or the vMotion check? There is a known issue with the vMotion check as per the 6.6 release notes: http://pubs.vmware.com/Release_Notes/en/vsan/66/vmware-virtual-san-66-release-notes.html

      vMotion network connectivity test incorrectly reports ping failures
      The vMotion network connectivity test (Cluster > Monitor > vSAN > Health > Network) reports ping failures if the vMotion stack is used for vMotion. The vMotion network connectivity (ping) check only supports vmknics that use the default network stack. The check fails for vmknics using the vMotion network stack. These reports do not indicate a connectivity problem.

      Workaround: Configure the vmknic to use the default network stack. You can disable the vMotion ping check using RVC commands. For example: vsan.health.silent_health_check_configure -a vmotionpingsmall

Leave a Reply