Getting started with VMware Cloud Foundation (VCF)

After returning from the holidays, one of the items at the top of my agenda was to become more familiarity with VMware Cloud Foundation (VCF). For those of you who are not familiar with VCF, it is basically the ‘easy button’ for deploying the full vSphere stack of products, including virtual storage (vSAN), virtual networking (NSX) as well as monitoring and logging products such as vRealize Operation, vRealize Log Insight and so on. However, it is so much more, because once VCF is stood up, it becomes the building block for the deployment of what could be termed the application suite from VMware, such as Horizon View and PKS, and indeed additional virtual infrastructure environments that can be consumes by different tenants of your organization.

In this post, which I am planning to be the first of a number of posts on VCF version 3.9, I am going to go through the initial deployment of the management domain. However before we get into that, we need to spend a little time to understand some of the basic architectural components of VCF.

Cloud Builder

The Cloud Builder is a virtual appliance provided by VMware, and is used to deploy and configure the VCF management domain, and once that task is completed, it transfers control over to the SDDC Manager.

SDDC Manager

Once the management domain has been configured by cloud builder, control and management passes to the SDDC Manager. From here, a multitude of tasks can be carried out, such as the creation of workload domains such as Virtual infrastructure WLDs, Horizon WLDs and PKS WLDs in the 3.9 version of VCF. You can also commission new ESXi hosts for use in WLDs, manage new Composable Infrastructure from both DELL (MX series) and HPE (Synergy series) and deploy the a vRealize Suite (i.e. vRealize Operations and vRealize Automation).

Management Domain

A management domain is used to host the infrastructure components needed to instantiate, manage, and monitor the Cloud Foundation infrastructure. The management domain is automatically created using the Cloud Builder appliance when it is initially configured.

A management domain requires 4 ESXi nodes. It also only leverages vSAN for storage. Thus vSAN is the required principal storage for any VCF management domain.

Workload Domains

Workload Domains provide an integrated selection of compute, storage and network resources for business workloads to run in. These are a logical abstraction of resource, provisioned automatically by SDDC (Software Defined Data Center) Manager, and administered and patched independently. VCF provides a fully automated process for creating, extending, and deleting workload domains using SDDC Manager. Hosts and clusters may also be removed from a workload domain to reduce its size.

Workload domains require a minimum of 3 ESXi hosts and can leverage vSAN storage (default and preferred), but it can also consume secondary storage in the form of NFS 3.x datastores (since VCF 3.5) and Fibre Channel Storage (since VCF 3.9).

A user will have one primary/principal storage but could have additional secondary storage. This is a topic for another day.

Cloud Builder in action

I am going to deploy VCF 3.9 in this example. We start with the cloud builder. The cloud builder is a single use appliance provided by VMware. The deployment is not much different from many other vSphere appliances. Simply deploy the OVA, populate the appropriate fields and once the deployment completes, users connect to a cloud builder URL to continue the deployment. Once logged in using credentials provided during the appliance roll-out, admins are presented with a Pre-bringup checklist. All of these tasks need to be validated before continuing. For those considering deploying VCF, please spend a lot of time validating this checklist. It will save you a lot of time and effort later on in the deployment process, as it is not easy to modify the configuration after the roll-out has commenced. In may cases, you may need to clean up the parts that have already been deployed and start over. Therefore spending some cycles ensuring this checklist is valid for your environment will be beneficial in the long run.

Next is the EULA. You simply need to agree to this to continue the deployment.

And now we come to the guts of the cloud builder. Admins are provided with a sample configuration file/parameter sheet which must be populated with information pertinent to the Software Defined Data Center (SDDC) that they wish to deploy.

The components of the SDDC which requires information to be populated in the configuration are:

  • ESXi hosts
  • vCenter Server
  • vSAN
  • NSX
  • SDDC Manager
  • vRealize Log Insight

The information that needs to be populated includes license information, host and appliance passwords, networking information (VLAN, IP addresses, Gateways, MTUs), as well as SSH RSA Key Fingerprints and SSL Thumbprints (SHA1) for the ESXi hosts.

NSX requires additional information for the NSX Manager and the 3 NSX Controllers that are deployed. vRealize Log Insight is deployed as a 3 node cluster, so requires IP addresses and hostnames as well as an additional Load-Balancer IP address and hostname. Here are a few sample pages from the configuration parameters that needs to populated. DNS resolution of hostnames is also key. Below is the management workloads page where licenses for the various components are populated.

Another page in the configuration relates to the ESXi hosts that make up the management domain. You need to also include VLAN information and IP addresses for the vMotion network and vSAN network, as you can see below.

The fingerprint and thumbprint from each ESXi host is also required. I’ll show you some tips on how to capture those shortly. This may seem like a lot of information, but the spreadsheet is actually quite straight forward to populate. Once the parameter sheet is filled, simply upload it to the cloud builder:

Once the parameter file has been uploaded, it will be validated by the cloud builder. Since I was deploying this in my lab, the validation reported a number of warnings. For production deployments, all of these should be examined and addressed so that the status reports Success is all cases.

Once the parameter file validation has completed, we now ready to bring-up the SDDC. This is when the real activity of automatically deploying a full stack Software Defined Data Center begins:

What is very neat is that every step of the way, you are shown exactly what activity is taking place. Here is the very start of the bring-up from my lab environment.

For more detailed information, users can SSH onto the cloud builder virtual appliance and tail the bring-up log as follows:

chogan@chogan-a01 ~ % ssh admin@cloud-builder-01
admin@10.27.51.134's password:*****
Last login: Thu Jan 9 11:44:01 2020 from 10.30.1.189
admin@vcf-cb-01 [ ~ ]$

admin@vcf-cb-01 [ ~ ]$ ls -lt /var/log/vmware/vcf/bringup
total 11772
-rw-r--r-- 1 vcf_bringup vcf 6426077 Jan 9 15:17 vcf-bringup-debug.log
-rw-r--r-- 1 vcf_bringup vcf 1731087 Jan 9 15:17 vcf-bringup.log
.
.
admin@vcf-cb-01 [ ~ ]$ tail -f /var/log/vmware/vcf/bringup/vcf-bringup.2020-01-08.0.log

For more detailed information, the debug log of the bring-up can be referenced.

Now, as you can well imagine, there are a lot of steps to standing up an SDDC. The screenshot above is only a small snippet of everything that is deployed and configured. But the whole point is that this is now completely automated, saving you a whole bunch of time, effort and in a lot of cases, pain. I was very pleasantly surprised with how quickly I was able to roll out an SDDC on my 4 lab ESXi nodes, once I gained some level of familiarity with cloud builder and how to populate the parameter sheet.

Some useful tips

(a) Fingerprints and Thumbprints

When filling out the parameter sheet, you are asked to provide both the SSH Fingerprint and SSL Thumbprint. While this information is available in the DCUI of each ESXi host, it is a PITA to go around to each console to retrieve this info. My good pal William has details on how to get the SSL thumbprint on his site here, but what about the SSH fingerprint?

This command (taken from William) when run on the ESXi host will give you the SSL Thumbprint:

[root@esx-dell-p:~] openssl x509 -in /etc/vmware/ssl/rui.crt -fingerprint -sha1 -noout
SHA1 Fingerprint=49:43:58:81:C0:20:FA:B6:20:43:8D:44:CC:D7:ED:B7:8D:48:92:E7

This command when run from a host that has ssh-keyscan, will give you the SSH Fingerprint:

chogan@chogan-a01 ~ % ssh-keyscan [ip address of host] 2>/dev/null | ssh-keygen -lf - | awk '{print $2}'
SHA256:ZleOMi6B3gSn43JEXrD6hfCJCgh1FPICngzYiTXykIk

(b) No eligible capacity disks found

In this example, the bringup failed because it said there were no capacity disks available for vSAN. The actual real cause was the fact that there was a combination of flash devices in one of the hosts, not just the two types expected for the cache tier and the capacity tier. The answer was in KB 54793. When configuring an all-flash host for use with VMware Cloud Foundation for Service Providers, ensure that there are no more than two types or sizes of flash drive. I was able to work around this by simply formatting the additional flash devices (not required for vSAN) with VMFS-6. Retrying the bringup picked up where it left off, which is a nice feature. Kudos to my buddy Paudie for that tip.

(c) Ensure you have the correct licenses

While this is unlikely to happen in customer environments, it might be an issue for those of you with home labs. I noticed an issue with licenses when my hosts failed to join the distributed switch:

I logged into the vSphere client, and immediately saw warnings about my vCenter and host licenses mismatching. I provided a new standard vCenter license (previously it was only essentials), reconnected my disconnected hosts to vCenter and hit RETRY. This time the ESXi hosts successfully added to the distributed switch.

The only other issues I hit were (a) selecting the wrong VLAN for my vSAN network and (b) having a duplicate IP address on my network which I had assigned to the Log insight Load Balancer (both of these were my own fault). Fortunately, I was able to fix the VLAN issue by logging into the management vCenter and correcting the VLAN ID on the distributed portgroup for the vSAN network. I also tracked down the duplicate IP (a K8s service) and took it off the network, allowing the Log Insight Load Balancer configuration to complete.

This is a good lesson on getting everything right up front. These were easy enough to address, but there is no way to modify the configuration after deployment. If you use the wrong IP address for something, you will probably have to delete everything that was created and start again. So the lesson is, double-check and treble check your parameter file before doing a bringup.

SDDC successfully deployed – woot!

All going well, and if no further issues are encountered, you will now have your full SDDC up and running. The cloud builder will provide you with a link to the SDDC Manager and you are now ready to start creating workload domains. It should look something like the screenshot below.

Here is a look at the SDDC Manager. We’ll delve into this in more detail in a future post:

And here is the SDDC that was deployed, as seen from the vSphere client. We can see the 4 ESXi nodes, the NSX Manager and 3 controllers (Node entries). We see the vCenter with 2 x External PSCs (Platform Service Controllers), a 3 node Log Insight Cluster and the SDDC Manager. The cloud builder (vcf-cb-01) can now be removed as it has served its purpose.

There are some other steps that we should now do. One of these is the roll-out of some additional vRealize products such as vRealize Operations and vRealize Automation. Then we can take a look at the Workload Domains, and rollout workload domains for other virtual infrastructures, Horizon View or even PKS. I’ll try to get to those soon. We will look at these is some future posts.

In the meantime, if you wish to learn more about VCF, check out these links. This is the official product page on VMware.com and this is the FAQ. The VCF 3.9 Release Notes are here. Thanks for reading this far.

12 Replies to “Getting started with VMware Cloud Foundation (VCF)”

  1. Cormac, you are absolutely right that the Fingerprint & Thumbprint are a PITA to obtain and although William Lam’s CLI commands will gather the fingerprint I still found all this cumbersome. I would like to suggest another method. If you submit the validation w/o the correct Fingerprint/Thumbprint the validator will return what it finds on the 4 Nodes specified for the MGMT Domain. Since it is reading it directly from each node, you can then copy and paste from the validation screen into your spreadsheet. I know it is not elegant but it does work.

  2. Hey Cormac, would be great if you could give Version 3.9.1 a try and share some thoughts around the AVN which was introduced.
    But the article is a good summary what vcf does.
    Cheers Ian

    1. Thanks Ian.

      Unfortunately, I do not have access to enough networking equipment to try the 3.9.1 version – seems like it needs a number of upstream routers that are cross connected for availability purposes. It therefore limits what we can do in a lab environment.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.