Getting started with HCIbench, the benchmark for hyper-converged infrastructure
This week I had the opportunity to roll-out theĀ HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.
Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.
Step 1 – Deploy the OVA
To get started, you deploy a single HCIbench appliance called Auto-Perf-Tool. There’s nothing special about the appliance itself. It comes as an OVA, and if you’ve deployed an OVA appliance before, then this is no different. You provide the usual information such as where to place the appliance. For those of you wishing to test VSAN performance, you’ll be deploying this appliance to a VSAN datastore most likely.
Step 2 – Point a browser at the appliance, and add vSphere environment info
The next step is to open the console and populate some information on the appliance, such as a root password and some network details. This is even easier if you use DHCP. When this information is provided, the appliance completes its boot process. At this point, you open a browser and point it to the IP Address of the appliance and port 8080, and you are now presented with a template/form to populate. The first section of the form looks for information such as the vCenter server and credentials, data center, cluster name, network, datastore, etc. Note that in the current version, the VM Network must be on a standard vSwitch. You cannot use a distributed switch (DVS) portgroup this time. The network defaults to “VM Network” and datastore defaults to “VSAN datastore” automatically if these are not provided:
[Update: 21-Oct-2015] Ensure that the Datastore Name field is populated in the most recent appliance. Although it is shown as not being required, in the latest release we support multi-datastores deployments so this field must be specified, even if it is the VSAN datatstore that is being tested. If you do not add this, the benchmark will fail with “A required parameter is NULL, please re-check your configuration file !”
Step 3 – Add host and benchmark VM info
The next section is about the hosts, and the VMs that are going to run the benchmark. You add a list of ESXi hosts (the hosts that are participating in the VSAN Cluster), one line at a time, and then supply information about the VM workload, including number of VMs you wish to deploy, number of disks and size of disks. In this example, I have 4 hosts so I will deploy 8 VMs, each with 4 disks, and each disk 10GB in size. These VMs will be distributed across all hosts in the cluster, leveraging the distributed nature of VSAN’s compute and storage.
Step 4 – Download and add vdbench zip file, and add parameter file
Once this is done, users need to provide access to the vdbench tool. Due to licensing issues, we are not allowed to distribute the vdbench benchmarking tool, so it needs to be downloaded from Oracle if you do not have it already. There is a link provided to the Oracle website to down the vdbench zip file, but you will need to have an account on Oracle’s site to access it. Once the vdbench zip file has been downloaded locally, you must then uploaded to the appliance. The next part of the setup is to generate a vdbench parameter file, which has information such as I/O size, R/W ratio and whether the I/O should be random or sequential in nature. You should also state how long you want the test to run (3600 seconds = 1 hour below), as well as whether you want to dd the storage first (initialize it). Finally, decide if you want the benchmark VMs cleaned up once the test completes. Save the configuration. To make sure that everything is OK, run the validate test. This will verify that all the configuration parameters are correct, and will state whether it is OK to start the test.
Step 5 – Monitoring the workloads
Click on the Test button to start the benchmark. The tool next deploys a bunch of VMs as per the configuration, each of which will run an instance of vdbench.
In my example, I had a 4 node cluster, and I selected 8 VMs to roll out. This will deploy 2 VMs per host in a distributed manner. In the screenshot to the left, you can see the original benchmark tool called Auto-Perf-Tool, and 8 additional VMs rolled out for the purpose of the test, each names vdbench–. Once the VMs have been rolled out, and are generating I/O, each of them can be examined for further information. For example, you can check to see that they have the appropriate number of disks as per the configuration, and have been deployed on the correct VM network. I also find it useful to select one of the VMs, open the Monitor > Performance view. In the Advanced view, I select the virtual disks and modify the “chart options” to select the read and write rates value. I can then see the amount of I/O that is in-flight from vdbench. In this particular set, I chose the reads and writes per second for each of the disks. This shows that vdbench is doing what it is supposed to do:
While the test is running, you will see the following displayed in the browser:
And when the test is complete, the following will be displayed:
You can now click on the results button, and navigate via the browser to where the results are stored. There is a text file for each VM which contains a lot of information regarding IOPS, Latency and Throughput information. Here is an example of such a results output taken from my environment:
However you can also navigate further along to what is essential a VSAN Observer collection. Click on the stats.html file to display a VSAN Observer view of the cluster for the period of time that the test was running:
Note: The current version of the HCIbench appliance needs to reach out to the internet in order to get various fonts and css files needed to render VSAN observer graphs. This same principle holds for VSAN Observer when run from vCenter server. If there is no path to the outside world, these VSAN Observer graphs captured by HCIbench will not render properly. In an upcoming HCIbench appliance, this requirement was addressed, and all of the necessary components to render the VSAN Observer graphs will be included with the appliance.
If things are not going right for some reason, there are 4 places to check.
- Has the vdbench zip file uploaded to the appliance successfully? It should be found in /opt/output/vdbench-source. If something isn’t correct, you can always delete it, refresh the browser and upload a new version.
- Has the vdbench parameter file been created correctly? It should be located in the /opt/automation/vdbench-param-files. the name varies based on what configuration options are chosen. If it doesn’t look correct, you can always delete it and generate a new one.
- Has the complete configuration file, including vCenter and Host information been created correctly? It should be location in /opt/automation/conf and is called perf-conf.yaml. If it doesn’t look correct, you can delete it an recreate a new one.
- Finally, the logs of the performance test runs are located in /opt/automation/logs. If the tests are behaving, and you cannot see why from the messages in the browser, this is a good place to look.
Where do I get the bits?
- [Update: 8/23/16] Here is a link to the latest HCIbench appliance OVA.
- The user guide can be downloaded by clicking here.
Happy benchmarking!
Hi Cormac,
I’m just wondering if this tool can used to compare different HCI solutions? Having a look at the User Guide you linked to, it seems very VSAN specific (uses VSAN Observer, asks for VSAN cluster info, etc). It would be great to have a cross-vendor HCI testing tool that customers can spin up during POCs.
My understanding is that it can be used for other HCI platforms and storage types. The request for a cluster is vSphere specific – its just that in the case of VSAN, it is also the VSAN cluster. But this could be a generic cluster which does not have VSAN enabled. It also produces the vdbench output, so that can be used as a way to compare different solutions.
I have a hybrid array I am trying to run One of the volume as a VSAN data store . Is it required for me to check if they (their controller) support pass through ?
I am suspecting that to be the reason why I am seeing my array LUNs as ineligible disks ? i made sure my LUN is formatted as VMDK
Any devices used for the VSAN datastore must be local. You will not be able to use LUNs/Volumes from an external array.
@cormac , Thanks, Are you aware of any similar tool which acts as a wrapper to VDI bench , But still can be used to run IOs to non VSAN disks like iscsi vmdk datastores ?
There is the I/O Analyzer fling – https://labs.vmware.com/flings/io-analyzer
Hi cormac , Thanks . Do you have any pointers to a medium scale VDI VSCSI trace that I can replay and test on the IO Analyzer ? Thanks in advance .
I don’t have anything like that – sorry.
This looks like an excellent tool, but I don’t see the VM’s deploy. My validation completes successfully, but my log for *-vm-deploy.log returns:
Expected Datastore but got Datacenter at “”
I have a distributed switch, so I’m not deploying to the host directly. I see the perf-test-vms folder create and delete successfully however. Any tips on what might be wrong?
You cannot use a DVS with this version.
Can you create a VM network on a standard vSwitch on each of the hosts and try again.
Hi Ken,
We just had the fix for that, please re-download the ova and try it again.
Thanks,
Chen
Hi,
I need to perform testing of EMC VNX storage system through VMware environment. I have three different Workload profiles.
Is it possible within vdbench parameter file to define these three concurrent Wokrload profiles, or I need to deploy three HCIbench appliances each with different vdbench parametar file (reflecting each Workload profiles)?
Is it possible to perform testing using same VMware hosts (lets say 4 Vmware hosts, with 10 VM machines for each HCIbench?
Is it possilble to define multiple Data Stores where testing will be performed?
Danail,
Please see my answers inline.
Is it possible within vdbench parameter file to define these three concurrent Wokrload profiles, or I need to deploy three HCIbench appliances each with different vdbench parametar file (reflecting each Workload profiles)?
C.W. => HCIBench can handle multiple vdbench param files, and will test them one by one if you select “USE ALL” of “select a vdbench parameter file”
Is it possible to perform testing using same VMware hosts (lets say 4 Vmware hosts, with 10 VM machines for each HCIbench?
C.W. => if i understand your question correctly, are you asking if HCIBench could be deployed on the Cluster which will be tested against? For this question, the answer is yes.
Is it possilble to define multiple Data Stores where testing will be performed?
C.W. => Yes, our latest version supports multi-datastore deployment, but the number of vm should be splitted evenly(e.g. if you specify 3 datastores, the number of vms must be 3*N)
Hi,
Just upgrade my previous version of HCIBench appliance with the latest one and now I can’t validate any configuration anymore š It tells me that a required parameter is null and that I have to check my configuration file. All mandatory fields are filled correctly but validation is not possible. When I try to run the test, it last a few seconds before it tells me that they’re finished but without any results.
@Chen Wei : Any workaround for this ? Need this tool working because currently validating vSAN 6.1 proof of concept in order to show my customer the real value of the product
Check the log files highlighted at the end of the post. It should tell you what the problem is hopefully. Did you use any double quotes or other special characters anywhere in any of the fields? If so, remove them.
Hi Cormac,
Thx for ur reply. I did not use any double quotes or others special characters in any of the fields.
I found this in the all-in-one-testing.log :
/opt/automation/lib/deploy-vms.rb:29:in `’: undefined method `count’ for nil:NilClass (NoMethodError)
/opt/automation/lib/vdbench-io-test.rb:22:in `’: undefined method `each’ for nil:NilClass (NoMethodError)
A ruby related error ?
Check the configuration files, both the yaml one and the vdbench one. Make sure that all the entries look correct.
perf-conf.yaml checked. No errors in it for me. What is the vdbench configuration file that ur talking about ? Is it in /opt/automation/conf ?
Not quite – they are in /opt/automation/vdbench-param-files. See the troublehsooting section above. If you still don’t have a solution, email VSANperformance@vmware.com (this help email is in the guide iirc)
Thx Cormac. I’ll send an email to vsanperformance š I’ll keep u update to the issue
vsanperformance@vmware.com is not a valid email address š Got a delivery error message on it. Another contact ? Thx Cormac
It looks like it is case sensitive – try VSANperformance@vmware.com instead
Hi Cormac,
I’ve deployed the HCIBench tool and everything seems to be working, validation completes successfully, 10 VM’s are deployed etc, however when it deploys the VM’s they do not boot into an OS they just sit there waiting on PXE boot, in the web GUI the progress bar does not get past “deployment started”. Is the OS not on the 8GB drive that’s deployed as part of ovf ?
Yes – that doesn’t sound right. Please email VSANperfomance@vmware.com for assistance.
Could you try to redo the testing? That happened might because of deployment was interrupted.
Thanks for the prompt response guys. I’ve redeployed the tool and I’m getting further than before. The test now runs but completes after 10 or so seconds even though I have set the test to run for 6000 seconds. When I look at the results file VMs, IOPS, TPUT & LAT are 0.
Did you see the vms being deployed?
Could you check the logs in /opt/automation/logs?
let me know if you need a troubleshooting by sending the email to VSANperformance@vmware.com.
Thanks,
Chen
Hi Chen,
Yes it’s all working now, it was a problem with the DHCP server.
Hi,
I’ve a lab test with 3 identical ESXi 5.5 U2a hosts A,B,C.
The Auto Perf Tools work fine on host A and B but not on host C.
I deep analyze the trouble checking inside the VM using a gparted iso bootable distribution. I’ve found that when the rvc-perf-vm deploy the vdbech-vm on server C the first vmdk disk exists but it is empty (no partition, no OS’s).
The conseguence of that is the VM don’t find bootable.
I look at log but I don’t have found any usefull information;
[root@rvc-perf-vm logs]# cat all-in-one-testing.log
[root@rvc-perf-vm logs]# cat host-dcesx30.sanita.vi-vm-deploy.log
2015-10-28 15:53:47 -0700: Creating 1 VMs…
2015-10-28 15:53:47 -0700: Creating 1 VMs in batch 0…
networks: VM Network-1284 = vmlaboratoriovm
vdbench-1446072826-serverC-storage-0-1
DEBUG: Timeout: 300
Iteration 1: Trying to get host’s IP address …
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 1042M 0 81638 0 0 1295 0 9d 18h 0:01:03 9d 18h 1040
curl: (23) Failed writing body (282 != 16384)
Iteration 1: Trying to access nfcLease.info.entity …
HttpNfcLeaseComplete succeeded
Adding 10 disks
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-1
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-2
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-3
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-4
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-5
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-6
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-8
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-9
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-10
ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
Added device disk-1000-11
Powering on VMs …
PowerOnVM vdbench-1446072826-serverC-storage-0-1: success
Waiting for VMs to boot …
[root@rvc-perf-vm logs]# cat test-status.log
Deployment Started.
Can you help me?
Hi, I’ve solved in the old way, I’ve rebooted the hosts without change anything and now it works.
Great tools!
Hello,
I have deployed the HCIbench OVA to an EVO RAIL cluster and I’m experiencing some issues when it starts the deployment of the guest VMs.
The problem is with the naming convention used for the Vdbench Guest VMs. When it start the deployment of the VMs, it appends the name of the datastore to the Guest VMs. However, in an EVO:RAIL deployment, the VSAN datastore has a long name (e.g: MARVIN-Virtual-SAN-Cluster-f990015d-8dc2-4869-ab88-1b04f3d3773f). Because of the long name, it fails with error message “is invalid or exceeds the maximum number of characters permitted.”
Here one example of a VM name and error when deploying to an EVO:RAIL VSAN datastore: “‘vdbench-vc-1446661119-MARVIN-Virtual-SAN-Datastore-f990015d-8dc2-4869-ab88-1b…’ is invalid or exceeds the maximum number of characters permitted.”
Thanks,
Jose
I will let the developers know.
Jose,
Is that possible to modify the name of vsan datastore with a shorter name?
if not, please get into the HCIBench console and modify the file:
/opt/automation/lib/deploy-vms.rb
please find and change the vdbench-vc-#{time_var}-#{datastore} to vdbench- #{datastore}.
the #{datastore} pattern is used for identifying where the vms are to support multi-datastores deployment.
Feel free to contact me directly by email if you have any further issues: VSANperformance@vmware.com
Thanks,
Chen
Thanks Chen and Cormac!
This is a great tool!
I thought about changing the datastore name, but didn’t want to change the EVO:RAIL default setup.
Changing the file “deploy-vms.rb” did the trick.
I’ll let you know if I find any other issue.
Jose
Chen
I tried to send you an email directly to VSANperformance@vmware.com but the message is being rejected.
After the change, VMs were deployed correctly and workload executed, but the results show 0 in all the counters. Is there any reason why the results are 0 ?
/results/results20151104101655/vdb-10vmdk-10ws-4k-70rdpct-0randompct-1446758335-res.txt
Datastore: MARVIN-Virtual-SAN-Datastore-f990015d-8dc2-4869-ab88-1b04f3d3773f
VMs = 0
IOPS = 0.00 IO/s
TPUT = 0.00 MB/s
LAT = 0.00 ms
=============================
Thanks,
Jose
This was an issue with one of the earlier releases. Are you using the most recent version from the link provided Jose?
Jose,
Could you verify if there are vdbench results files in the directory “/results/results20151104101655/vdb-10vmdk-10ws-4k-70rdpct-0randompct-1446758335”? If they are there, means the testing is finished successfully, could you let me know the results files name?
Thanks,
Chen
Hi Chen,
Yes, the reults files for each VM are there and they have the vdbench results. However, the *-res.txt” file for every workload has “0” for the metrics.
Here
/results/results20151107181133
(this is the file that has 0 as results)
results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193-res.txt
results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/
here one of the VM’s output:
results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/vdbench-1446948711-1.txt
Thanks,
Jose
Just add some info here, I did a test in a non-VSAN datastore, and the “-.res.txt” has the total values correct for each workload.
Apparently the problem is when running workloads in a VSAN datastore.
Hi Jose,
the vdbench result file for each individual vm should be named as “vdbench-vc-TIMESTAMP-DATASTORE_NAME-[1…n].txt”, since the files you have doesn’t contain the Datastore_name, calculation script was not able to find those files.
my suggestion:
1. in the file /opt/automation/lib/deploy-vms.rb, you should have the #{datastore} showed up in where you modified last time.
2. Modify the file name in /opt/output/results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/ to vdbench-DATASTORE_NAME-[1…n].txt, and run
“/opt/automation/vdb-process-long.sh /opt/output/results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193 DATASTORE_NAME”, if the result doesn’t look right, use /opt/automation/vdb-process-short.sh instead.
let me know if you need further assistance.
thanks,
Chen
I have a couple of questions, it woudl be great if you could help me with it
Can we change the disk format of the vdbench test VMs. As of now it is think provisioned lazied zero , how can we make it as thin provisioned or think provisoned eager zeroes.
How can we change the vdbench test VMs, CPU and memory. ? Do we have to edit /opt/output/vm-template/perf-ubuntu-vdbench.ovf
Can we change the disk format of the vdbench test VMs. As of now it is think provisioned lazied zero , how can we make it as thin provisioned or think provisoned eager zeroes.
Yes, you can use thin provision by comment out the line 871 of /root/rvc/lib/rvc/modules/vsantest/perf.rb
How can we change the vdbench test VMs, CPU and memory. ? Do we have to edit /opt/output/vm-template/perf-ubuntu-vdbench.ovf
Yes, you have to modify that file to change cpu and ram.
Thanks,
Chen
Thanks Chen,
It works ..
Regards
~Anil
Is it possible to change the storage policy being used for the worker VMs?
We’re using VSAN 5.5 currently and the worker VMs don’t pick up any storage policy when they are deployed so they have the default settings (n+1, stripe width of 1 etc). We’d like to compare different settings such as stripe width so this would be a useful feature if possible.
Cheers
Andy
I’m not sure if there is an easy way to do this. One way is to modify the default policy before each test, and then set it back afterwards. Unfortunately, this can only be done via esxcli in VSAN 5.5 iirc.KB 2073795 has information on how to do this.
Thanks Cormac. I thought it would be along those lines but hadn’t seen that KB before so this will help me out if the test results mean we want to change the default policy.
Doesn’t look to be a huge job to modify the default policy before each test though.
Thanks again.
Hi Everyone,
I have successfully deployed the hcibench appliance. The test validation succeeds. However, when I start the test. I get one of two results.
1. The test shows Completed immediately. Then I see the worker VMs deploy and sit at a OS not found screen.
2. Test Starts. Deployment started begins and halts around 40% complete. The worker VMs are deployed from the template and power on. But no further progress is made on the test (I cancel after an hour or so)
The documentation and this website reference VSANperformance@vmware.com as a potential contact for help with the tool, but any emails to that addressed are bounced.
I think the VM network on which the VMs are deployed needs DHCP
Hello,
Iām trying to use HCIbench and Iāve followed the documentation- everything seems to be configured correctly, but when I click, ātest,ā it creates all of the VMs, powers them on for a few seconds, powers them off, and then deletes them. The results look something like this:
Datastore: vsanDatastore
VMs = 0
IOPS = 0.00 IO/s
TPUT = 0.00 MB/s
LAT = 0.00 ms
=============================
Any thoughts on what might be going on? Thank you.
Check through the log files highlighted in the article Vincent. It should give you a bit of a clue as to why this is failing. You also need to have DHCP configured for the vdbench VMs to deploy – is this available?