In an earlier post, I described the witness appliance in a lot of detail. Using the witness appliance is VMware’s recommended way of creating a witness host. Ideally, customers should avoid building their own bespoke appliances for this purpose. Also note that the witness appliance is not a general purpose ESXi VM. It doesn’t store/run nested VMs, and it has no VM or HA/DRS related functions. It simply functions as a VSAN witness and plays no other role. In this post, I will take you through step by step instructions on how to deploy a witness appliance for either a VSAN stretched cluster deployment or a VSAN 2-node ROBO type deployment. Unfortunately this entails a lot of screen-shots, so apologies in advance for that. However I did want this post to cover all the angles. There are 6 steps to getting the appliance deployed and configured successfully.
1. Physical ESXi network setup
Let’s begin by taking a look at the underlying physical ESXi host that we wish to deploy the witness appliance onto. You will have to ensure that the physical ESXi on the witness site is able to reach both the management network and the VSAN networks of the data sites. In my configuration, this physical ESXi on the witness site has an uplink connection to a trunk port which can access both the management network and the VSAN network, for simplicity sake. There is a portgroup created on the physical ESXi host, which uses this uplink.
2. Deploy the OVA
In the interest of completeness, all of the OVA deployment steps are shown here (consider this screenshot hell part 1 if you wish). If you feel that you do not need to step through this and are comfortable with OVA deployments, please skip to part 3, the Management network configuration section. To begin the deployment, select a host or cluster and then choose “Deploy OVF Template…”. In this example, it is being deployed directly to a host:
When the OVA has been deployed, open a console to it. Since it is an ESXi in a VM, what you will see is the DCUI, hopefully familiar to you from managing physical ESXi servers. Login to the DCUI and configure the appliance so that it is reachable on the network. Provide the root password that you provide during the OVA deployment:
4. Add the witness appliance to vCenter (as an ESXi host)
Once the management network has been configured on the witness appliance through the DCUI, it can now be added to the vCenter server as an ESXi host. This is no different to adding a physical ESXi host to vCenter. The first step is to add the hostname:
5. Setup the VSAN network
The fifth step is to configure the VSAN network. Two pre-defined vSwitches, portgroups and VMkernel ports are pre-packaged with the appliance. Remember from my previous blog post on the witness appliance that the second network interface has been configured such that the vmnic MAC address inside of the nested ESXi host matches the network adapter MAC address of the witness appliance (the outer MAC matches the inner MAC, so to speak). This means that we do not need to use promiscuous mode.
There is no work necessary on vSwitch0/vmk0, the management network – this has been configured via the DCUI in step 3 earlier.The focus on this step is to enable VSAN traffic on the VMkernel interface vmk1, which is on the witness portgroup on the witnessSwitch vSwitch.
6. Add static routes as required
There is one final step, and that is ensuring that the VSAN network on the witness appliance can reach the VSAN network(s) of the data sites (and vice-versa). In ESXi, there is only one default gateway, typically associated with the management network. The storage networks, including the VSAN traffic networks, are normally isolated from the management network, meaning that there is no route to the VSAN network via the default gateway of the management network. Considering the recommendation is to route (L3 network) between the data sites and the witness site, the VSAN traffic on the data sites will not be able to reach the VSAN traffic on the witness as it will be sent out on the default gateway. So how do you overcome this?
The solution is to use static routes in the current version of VSAN. Add a static route on each of the ESXi hosts on each data site to reach the VSAN network on the witness site. Similarly add static routes on the witness host so that it can reach the VSAN network(s) on the data sites. Now when the ESXi hosts on the data sites need to reach the VSAN network on the witness host, it will not use the default gateway, but whatever path has been configured on the static route. The same will be true when the witness needs to reach the data sites.
While static routes is not the most elegant solution, it is the supported way of implementing the witness at the moment. Plans are underway to make this simpler.
Conclusion
At this point you should have a fully functional witness host. You can now go ahead and create your VSAN stretched cluster or VSAN ROBO solution. Please use the health check plugin to verify that all the health checks pass when the configuration is complete. It can also help you to locate a mis-configuration if the cluster does not form correctly. We will shortly be releasing a VSAN Stretched Cluster Guide, which will describe the network topologies and configuration in more detail. It contains deployment instructions for the VSAN witness appliance, including all the configuration steps. If considering a VSAN stretched cluster deployment (or indeed a ROBO deployment), you should most definitely review this document before starting the deployment.