Building a simple ESXi host overlay network with NSX-T
I’ve recently begun to look at NSX-T. My long-term goal is to use it to enable me to build multiple Kubernetes clusters used PKS, the Pivotal Container Service. The hope is then to look at some cool storage related items with Kubernetes. But first things first. Kudos to both Sam McGeown and William Lam for their excellent blogs on NSX-T. However, I’m coming at this as a newbie, and I’m not using a nested environment, but rather a 4 nodes physical environment in my lab. And I am also not separating my cluster into management and production, but rather using the 4 nodes to host my NSX-T Manager, NSX-T Controller and eventually my NSX-T Edge. And this will also be the cluster where I will deploy PKS. So my environment is a little different to Sam’s and William’s. This first blog will just look at getting an overlay network deployed across my 4 ESXi hosts. Later on, I’ll look at some more complex networking items.
NSX-T Deployment
I am not going to cover the deployment steps for the Manager and Controller. You can refer to the actual NSX-T documentation here or pop over to Sam’s blog here for the steps. The Manager and Controller (and the Edge) are available as OVAs and can be downloaded from here. I’m just going to test with a single controller as this is simply a learning exercise. For production, you would obviously follow NSX-T best practices and guidelines.
To create an overlay network, you will need to use some spare physical NICs on your ESXi hosts. My 4 hosts are configured identically as follows:
So you can see that I have a free 10Gb link (vmnic1) and a spare 1Gb link (vmnic2) on each host. We’ll be coming back to this later.
Host / Cluster Configuration – Create Fabric Nodes
At this point, I will assume that you have successfully deployed the NSX-T Manager and Controller(s) and you are ready to configure your ESXi hosts as NSX-T Fabric Nodes. A fabric node is a node that has been registered with the NSX-T management plane and has NSX-T modules installed. For an ESXi host to be part of the NSX-T overlay, it must first be added to the NSX-T fabric. When you first login, the NSX-T Manager Dashboard will look something like this:
To configure your ESXi hosts as Fabric Nodes via the NSX-T Manager, navigate to the Fabric : Compute Managers section of the UI, and click on +ADD to add your vCenter. You populate the usual information, as you would expect.
This process also looks for a thumbprint. If this is not populated, you’ll get a prompt about an invalid thumbprint. Simply click Yes and your vCenter will be added.
The next step is to push out the necessary NSX-T components to the hosts. In the NSX-T Manager, navigate to Fabric : Nodes, select Hosts and change the Managed by from None: Standalone Hosts to the vCenter server you just added, in my case vcsa-06.rainpole.com.Expand the name of the cluster, and it should show that your hosts are not in a prepared state and are not connected to the NSX Controller.
Now click on the CONFIGURE CLUSTER link on the left hand side of the Hosts view. This will pop up an option to Automatically install NSX. Set that to Enabled. Leave the Automatically Create Transport Node to Disabled, as this is a more complex step which we will do manually later. Click on Save and this will push out the necessary VIBs to install the NSX-T modules on the ESXi hosts.
This will change the Deployment status from Not Prepared to NSX Install in Progress. At this point, you can SSH onto your ESXi hosts, and if you run ps -cJ | grep -i NSX | more, you should be able to see tasks installing the VIBs for NSX-T. When this step completes, your ESXi hosts will now be NSX-T Fabric Nodes. Here is a Hosts view when the install task has completed:
And if we take a look at the NSX Manager Dashboard, we now see that we have our hosts in the Fabric.
We can now move onto the next steps to create our overlay network.
Transport Nodes and Transport Zones
One of the steps that we skipped above was the automatic creation of transport nodes. So what is a transport node and a transport zone? A transport node could be an ESXi host, other hypervisor such as KVM, or an NSX Edge that is going to participate in an overlay network. A transport zone defines the potential reach of transport nodes. The easiest way to explain it is that if two ESXi hosts that are configured as transport nodes participate in the same transport overlay, then VMs on these different hosts using the overlay network can communicate with each other. This is achieved by limiting who is connected to an internal switch (hostswitch) which we shall create shortly. Hope that makes sense. Let’s do the steps and maybe it will become clearer.
Create a Transport Zone
This is pretty simple. What we do in this step is give the transport zone a name, and give the internal switch/hostswitch (N-VDS) a name. Leave the N-VDS mode as Standard and Traffic Type as Overlay.
Now whatever nodes we add to this transport zone will be on the same network overlay. However before we add the Transport Nodes to the Transport Zone, we need to define two additional items. The first is an Uplink Profile and the second is an IP Pool that we are going to use for IP Assignments for the overlay network.
Create an Uplink Profile
To create an uplink profile, navigate to Fabric : Profiles and select Uplink Profiles. Click on +ADD to begin a new profile. You will need to provide a name, a teaming policy, an identifier for the uplink to use for the overlay network, and if the physical NIC that you plan on using is on a VLAN, you will also need to provide a Transport VLAN. The last point is interesting. The uplinks that I plan to use on my hosts are unused (vmnic1 above), but they are on a trunked network. One of the VLANs is VLAN ID 50. This is the VLAN that I plan on using for my overlay network. Thus, I have to add it as my Transport VLAN. Note that the Active Uplinks entry is simply an identifier here – it does not yet mean that I will be using vmnic1. I will map that later when I create the transport nodes. Note also that I only have a single network, so Teaming Policy does not really mean anything. Obviously, if you had additional uplinks free, then you can populate the Standby Uplinks and the Teaming Policy accordingly.
Create an IP Pool
This IP pool will contain a range of IP addresses that will be assigned to the Tunnel Endpoints (TEPs). TEPs are used on the overlay network to identify the transport nodes/ESXi hosts. Navigate to Inventory : Groups and select IP Pools. Again, click on the +ADD link to create a new IP Pool. Populate accordingly. My IP Pool is shown below.
Create Transport Nodes and put them into the Transport Zone
Ok – at this point everything is ready. Our ESXi hosts have been added to the fabric, we’ve created our overlay transport zone, we’ve create an uplink profile, and we’ve created an IP pool for our TEPs. Let’s now go ahead and create our Transport Nodes, at the same time placing them into our overlay Transport Zone.
The first part of the Transport Node wizard is the General tab. Here we will add a name for the node, select the node from the list of nodes, and add the transport zone that we create earlier by selecting it in the Available list and clicking on the middle arrow to move it to the Selected list. It should look something like this.
Next click on the N-VDS tab. Here is we populate much of the previously created items. Because we already added the Transport Zone in the General tab, the wizard knows which internal switch/hostswitch we need to use, and so this appears automatically in the drop-down list of N-VDS Names. We select the uplink profile that we created earlier as well as setting the IP Assignment to our IP Pool. The final piece to populate is the physical NIC that we plan to use for the overlay. In my case, this is unused physical NIC vmnic1. I need to select this from all of the physical NICs shown in the drop-down. The N-VDS tab will now look something like this:
The right-most field above (which also says vmnic1) is simply the tag I used when I created the Uplink Profile earlier. You will now need to repeat this step for all of your hosts. When this is completed, your list of Transport Nodes should look something like this:
Verifying vCenter changes
The first thing you should observe is that there is a new NSX internal switch/hostswitch called host-N-VDS in this case visible in the physical adapters view of vCenter. Here is what I see on my environment.
Verifying ESXi changes
SSH onto one of your ESXi hosts that is now a Transport Node. If you run a CLI command such as esxcfg-vswitch -l, you should now see that there is a new switch, the NSX internal switch, which allows all the VMs on the hosts that are part of the transport zone to communicate over the overlay network. Again, in my case this switch is called host-N-VDS.
[root@esxi-dell-g:~] esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 7936 10 128 1500 vmnic3
PortGroup Name VLAN ID Used Ports Uplinks
VM Network 2 50 0 vmnic3
VM Network 51 5 vmnic3
vmotion 51 1 vmnic3
Management Network 51 1 vmnic3
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 7936 4 128 1500 vmnic0
PortGroup Name VLAN ID Used Ports Uplinks
iscsi-vlan-60 60 0 vmnic0
vlan-50 50 0 vmnic0
FT+vSAN 30 1 vmnic0
Switch Name Num Ports Used Ports Uplinks
host-N-VDS 7936 6 vmnic1
We also see new VMkernel interfaces plumbed up. One of these is the TEP (Tunnel Endpoint). Use esxcfg-vmnic -l to display them:
[root@esxi-dell-g:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 10.27.51.7 255.255.255.0 10.27.51.255 24:6e:96:2f:48:55 1500 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::266e:96ff:fe2f:4855 64 24:6e:96:2f:48:55 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 vmotion IPv4 10.27.51.54 255.255.255.0 10.27.51.255 00:50:56:66:14:63 1500 65535 true STATIC defaultTcpipStack
vmk1 vmotion IPv6 fe80::250:56ff:fe66:1463 64 00:50:56:66:14:63 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk2 FT+vSAN IPv4 10.10.0.7 255.255.255.0 10.10.0.255 00:50:56:61:6a:14 1500 65535 true STATIC defaultTcpipStack
vmk2 FT+vSAN IPv6 fe80::250:56ff:fe61:6a14 64 00:50:56:61:6a:14 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk10 10 IPv4 192.168.190.3 255.255.255.0 192.168.190.255 00:50:56:63:11:11 1600 65535 true STATIC vxlan
vmk10 10 IPv6 fe80::250:56ff:fe63:1111 64 00:50:56:63:11:11 1600 65535 true STATIC, PREFERRED vxlan
vmk50 c955ece1-9d8e-4a27-a434-1318da8c8732 IPv4 169.254.1.1 255.255.0.0 169.254.255.255 00:50:56:69:cd:e5 1500 65535 true STATIC hyperbus
vmk50 c955ece1-9d8e-4a27-a434-1318da8c8732 IPv6 fe80::250:56ff:fe69:cde5 64 00:50:56:69:cd:e5 1500 65535 true STATIC, PREFERRED hyperbus
Note that the IP address assigned to the TEP has come from the IP Pool that we created earlier. And, finally, to verify that we can communicate over the overlay, we have to use some special options to the vmkping command, ++netstack=vxlan. The objective is to ping the TEPs on the other Transport Nodes/ESXi hosts.
[root@esxi-dell-g:~] vmkping ++netstack=vxlan 192.168.190.1
PING 192.168.190.1 (192.168.190.1): 56 data bytes
64 bytes from 192.168.190.1: icmp_seq=0 ttl=64 time=0.293 ms
64 bytes from 192.168.190.1: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 192.168.190.1: icmp_seq=2 ttl=64 time=0.175 ms
— 192.168.190.1 ping statistics —
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.175/0.218/0.293 ms
[root@esxi-dell-g:~] vmkping ++netstack=vxlan 192.168.190.2
PING 192.168.190.2 (192.168.190.2): 56 data bytes
64 bytes from 192.168.190.2: icmp_seq=0 ttl=64 time=0.243 ms
64 bytes from 192.168.190.2: icmp_seq=1 ttl=64 time=0.148 ms
64 bytes from 192.168.190.2: icmp_seq=2 ttl=64 time=0.167 ms
— 192.168.190.2 ping statistics —
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.148/0.186/0.243 ms
[root@esxi-dell-g:~]
Excellent. So there we have it. Our overlay network is up and running. Now I need to try to do something more useful, especially if I want to do some more interesting work with PKS. Watch this space while I figure out what to do with an NSX Edge to provide routing capabilities, and all that good stuff.
Excellent Write up !!
Awesome job!! Thank you for explaining! 🙂