Getting started with HCIbench, the benchmark for hyper-converged infrastructure

benchmarkThis week I had the opportunity to roll-out theĀ  HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.

Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.

Step 1 – Deploy the OVA

To get started, you deploy a single HCIbench appliance called Auto-Perf-Tool. There’s nothing special about the appliance itself. It comes as an OVA, and if you’ve deployed an OVA appliance before, then this is no different. You provide the usual information such as where to place the appliance. For those of you wishing to test VSAN performance, you’ll be deploying this appliance to a VSAN datastore most likely.

Step 2 – Point a browser at the appliance, and add vSphere environment info

The next step is to open the console and populate some information on the appliance, such as a root password and some network details. This is even easier if you use DHCP. When this information is provided, the appliance completes its boot process. At this point, you open a browser and point it to the IP Address of the appliance and port 8080, and you are now presented with a template/form to populate. The first section of the form looks for information such as the vCenter server and credentials, data center, cluster name, network, datastore, etc. Note that in the current version, the VM Network must be on a standard vSwitch. You cannot use a distributed switch (DVS) portgroup this time. The network defaults to “VM Network” and datastore defaults to “VSAN datastore” automatically if these are not provided:

hcibench setup 1

[Update: 21-Oct-2015] Ensure that the Datastore Name field is populated in the most recent appliance. Although it is shown as not being required, in the latest release we support multi-datastores deployments so this field must be specified, even if it is the VSAN datatstore that is being tested. If you do not add this, the benchmark will fail with “A required parameter is NULL, please re-check your configuration file !”

Step 3 – Add host and benchmark VM info

The next section is about the hosts, and the VMs that are going to run the benchmark. You add a list of ESXi hosts (the hosts that are participating in the VSAN Cluster), one line at a time, and then supply information about the VM workload, including number of VMs you wish to deploy, number of disks and size of disks. In this example, I have 4 hosts so I will deploy 8 VMs, each with 4 disks, and each disk 10GB in size. These VMs will be distributed across all hosts in the cluster, leveraging the distributed nature of VSAN’s compute and storage.

hcibench setup 2Step 4 – Download and add vdbench zip file, and add parameter file

Once this is done, users need to provide access to the vdbench tool. Due to licensing issues, we are not allowed to distribute the vdbench benchmarking tool, so it needs to be downloaded from Oracle if you do not have it already. There is a link provided to the Oracle website to down the vdbench zip file, but you will need to have an account on Oracle’s site to access it. Once the vdbench zip file has been downloaded locally, you must then uploaded to the appliance. The next part of the setup is to generate a vdbench parameter file, which has information such as I/O size, R/W ratio and whether the I/O should be random or sequential in nature. You should also state how long you want the test to run (3600 seconds = 1 hour below), as well as whether you want to dd the storage first (initialize it). Finally, decide if you want the benchmark VMs cleaned up once the test completes. Save the configuration. To make sure that everything is OK, run the validate test. This will verify that all the configuration parameters are correct, and will state whether it is OK to start the test.

hcibench setup 3Step 5 – Monitoring the workloads

Click on the Test button to start the benchmark. The tool next deploys a bunch of VMs as per the configuration, each of which will run an instance of vdbench.

1. List of VMsIn my example, I had a 4 node cluster, and I selected 8 VMs to roll out. This will deploy 2 VMs per host in a distributed manner. In the screenshot to the left, you can see the original benchmark tool called Auto-Perf-Tool, and 8 additional VMs rolled out for the purpose of the test, each names vdbench–. Once the VMs have been rolled out, and are generating I/O, each of them can be examined for further information. For example, you can check to see that they have the appropriate number of disks as per the configuration, and have been deployed on the correct VM network. I also find it useful to select one of the VMs, open the Monitor > Performance view. In the Advanced view, I select the virtual disks and modify the “chart options” to select the read and write rates value. I can then see the amount of I/O that is in-flight from vdbench. In this particular set, I chose the reads and writes per second for each of the disks. This shows that vdbench is doing what it is supposed to do:

3. IOPS viewWhile the test is running, you will see the following displayed in the browser:

5. ProgressAnd when the test is complete, the following will be displayed:

6. Test is finishedStep 6 – Examine the results

You can now click on the results button, and navigate via the browser to where the results are stored. There is a text file for each VM which contains a lot of information regarding IOPS, Latency and Throughput information. Here is an example of such a results output taken from my environment:

hcibench result txtHowever you can also navigate further along to what is essential a VSAN Observer collection. Click on the stats.html file to display a VSAN Observer view of the cluster for the period of time that the test was running:

hcibench vsan observer listNote: The current version of the HCIbench appliance needs to reach out to the internet in order to get various fonts and css files needed to render VSAN observer graphs. This same principle holds for VSAN Observer when run from vCenter server. If there is no path to the outside world, these VSAN Observer graphs captured by HCIbench will not render properly. In an upcoming HCIbench appliance, this requirement was addressed, and all of the necessary components to render the VSAN Observer graphs will be included with the appliance.

VSAN ObserverTroubleshooting

If things are not going right for some reason, there are 4 places to check.

  • Has the vdbench zip file uploaded to the appliance successfully? It should be found in /opt/output/vdbench-source. If something isn’t correct, you can always delete it, refresh the browser and upload a new version.
  • Has the vdbench parameter file been created correctly? It should be located in the /opt/automation/vdbench-param-files. the name varies based on what configuration options are chosen. If it doesn’t look correct, you can always delete it and generate a new one.
  • Has the complete configuration file, including vCenter and Host information been created correctly? It should be location in /opt/automation/conf and is called perf-conf.yaml. If it doesn’t look correct, you can delete it an recreate a new one.
  • Finally, the logs of the performance test runs are located in /opt/automation/logs. If the tests are behaving, and you cannot see why from the messages in the browser, this is a good place to look.

Where do I get the bits?

Happy benchmarking!

50 Replies to “Getting started with HCIbench, the benchmark for hyper-converged infrastructure”

  1. Hi Cormac,
    I’m just wondering if this tool can used to compare different HCI solutions? Having a look at the User Guide you linked to, it seems very VSAN specific (uses VSAN Observer, asks for VSAN cluster info, etc). It would be great to have a cross-vendor HCI testing tool that customers can spin up during POCs.

    1. My understanding is that it can be used for other HCI platforms and storage types. The request for a cluster is vSphere specific – its just that in the case of VSAN, it is also the VSAN cluster. But this could be a generic cluster which does not have VSAN enabled. It also produces the vdbench output, so that can be used as a way to compare different solutions.

  2. I have a hybrid array I am trying to run One of the volume as a VSAN data store . Is it required for me to check if they (their controller) support pass through ?
    I am suspecting that to be the reason why I am seeing my array LUNs as ineligible disks ? i made sure my LUN is formatted as VMDK

      1. @cormac , Thanks, Are you aware of any similar tool which acts as a wrapper to VDI bench , But still can be used to run IOs to non VSAN disks like iscsi vmdk datastores ?

          1. Hi cormac , Thanks . Do you have any pointers to a medium scale VDI VSCSI trace that I can replay and test on the IO Analyzer ? Thanks in advance .

  3. This looks like an excellent tool, but I don’t see the VM’s deploy. My validation completes successfully, but my log for *-vm-deploy.log returns:

    Expected Datastore but got Datacenter at “”

    I have a distributed switch, so I’m not deploying to the host directly. I see the perf-test-vms folder create and delete successfully however. Any tips on what might be wrong?

    1. Hi Ken,

      We just had the fix for that, please re-download the ova and try it again.

      Thanks,
      Chen

  4. Hi,

    I need to perform testing of EMC VNX storage system through VMware environment. I have three different Workload profiles.

    Is it possible within vdbench parameter file to define these three concurrent Wokrload profiles, or I need to deploy three HCIbench appliances each with different vdbench parametar file (reflecting each Workload profiles)?

    Is it possible to perform testing using same VMware hosts (lets say 4 Vmware hosts, with 10 VM machines for each HCIbench?

    Is it possilble to define multiple Data Stores where testing will be performed?

    1. Danail,
      Please see my answers inline.

      Is it possible within vdbench parameter file to define these three concurrent Wokrload profiles, or I need to deploy three HCIbench appliances each with different vdbench parametar file (reflecting each Workload profiles)?

      C.W. => HCIBench can handle multiple vdbench param files, and will test them one by one if you select “USE ALL” of “select a vdbench parameter file”

      Is it possible to perform testing using same VMware hosts (lets say 4 Vmware hosts, with 10 VM machines for each HCIbench?

      C.W. => if i understand your question correctly, are you asking if HCIBench could be deployed on the Cluster which will be tested against? For this question, the answer is yes.

      Is it possilble to define multiple Data Stores where testing will be performed?

      C.W. => Yes, our latest version supports multi-datastore deployment, but the number of vm should be splitted evenly(e.g. if you specify 3 datastores, the number of vms must be 3*N)

  5. Hi,

    Just upgrade my previous version of HCIBench appliance with the latest one and now I can’t validate any configuration anymore šŸ™ It tells me that a required parameter is null and that I have to check my configuration file. All mandatory fields are filled correctly but validation is not possible. When I try to run the test, it last a few seconds before it tells me that they’re finished but without any results.
    @Chen Wei : Any workaround for this ? Need this tool working because currently validating vSAN 6.1 proof of concept in order to show my customer the real value of the product

    1. Check the log files highlighted at the end of the post. It should tell you what the problem is hopefully. Did you use any double quotes or other special characters anywhere in any of the fields? If so, remove them.

      1. Hi Cormac,

        Thx for ur reply. I did not use any double quotes or others special characters in any of the fields.

        I found this in the all-in-one-testing.log :

        /opt/automation/lib/deploy-vms.rb:29:in `’: undefined method `count’ for nil:NilClass (NoMethodError)
        /opt/automation/lib/vdbench-io-test.rb:22:in `’: undefined method `each’ for nil:NilClass (NoMethodError)

        A ruby related error ?

          1. perf-conf.yaml checked. No errors in it for me. What is the vdbench configuration file that ur talking about ? Is it in /opt/automation/conf ?

  6. Thx Cormac. I’ll send an email to vsanperformance šŸ˜‰ I’ll keep u update to the issue

  7. Hi Cormac,

    I’ve deployed the HCIBench tool and everything seems to be working, validation completes successfully, 10 VM’s are deployed etc, however when it deploys the VM’s they do not boot into an OS they just sit there waiting on PXE boot, in the web GUI the progress bar does not get past “deployment started”. Is the OS not on the 8GB drive that’s deployed as part of ovf ?

    1. Could you try to redo the testing? That happened might because of deployment was interrupted.

      1. Thanks for the prompt response guys. I’ve redeployed the tool and I’m getting further than before. The test now runs but completes after 10 or so seconds even though I have set the test to run for 6000 seconds. When I look at the results file VMs, IOPS, TPUT & LAT are 0.

  8. Hi,
    I’ve a lab test with 3 identical ESXi 5.5 U2a hosts A,B,C.

    The Auto Perf Tools work fine on host A and B but not on host C.

    I deep analyze the trouble checking inside the VM using a gparted iso bootable distribution. I’ve found that when the rvc-perf-vm deploy the vdbech-vm on server C the first vmdk disk exists but it is empty (no partition, no OS’s).

    The conseguence of that is the VM don’t find bootable.

    I look at log but I don’t have found any usefull information;

    [root@rvc-perf-vm logs]# cat all-in-one-testing.log
    [root@rvc-perf-vm logs]# cat host-dcesx30.sanita.vi-vm-deploy.log
    2015-10-28 15:53:47 -0700: Creating 1 VMs…
    2015-10-28 15:53:47 -0700: Creating 1 VMs in batch 0…
    networks: VM Network-1284 = vmlaboratoriovm
    vdbench-1446072826-serverC-storage-0-1
    DEBUG: Timeout: 300
    Iteration 1: Trying to get host’s IP address …
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    0 1042M 0 81638 0 0 1295 0 9d 18h 0:01:03 9d 18h 1040
    curl: (23) Failed writing body (282 != 16384)
    Iteration 1: Trying to access nfcLease.info.entity …
    HttpNfcLeaseComplete succeeded
    Adding 10 disks
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-1
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-2
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-3
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-4
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-5
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-6
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-8
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-9
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-10
    ReconfigVM vdbench-1446072826-serverC-storage-0-1: success
    Added device disk-1000-11
    Powering on VMs …
    PowerOnVM vdbench-1446072826-serverC-storage-0-1: success
    Waiting for VMs to boot …
    [root@rvc-perf-vm logs]# cat test-status.log
    Deployment Started.

    Can you help me?

  9. Hello,

    I have deployed the HCIbench OVA to an EVO RAIL cluster and I’m experiencing some issues when it starts the deployment of the guest VMs.

    The problem is with the naming convention used for the Vdbench Guest VMs. When it start the deployment of the VMs, it appends the name of the datastore to the Guest VMs. However, in an EVO:RAIL deployment, the VSAN datastore has a long name (e.g: MARVIN-Virtual-SAN-Cluster-f990015d-8dc2-4869-ab88-1b04f3d3773f). Because of the long name, it fails with error message “is invalid or exceeds the maximum number of characters permitted.”

    Here one example of a VM name and error when deploying to an EVO:RAIL VSAN datastore: “‘vdbench-vc-1446661119-MARVIN-Virtual-SAN-Datastore-f990015d-8dc2-4869-ab88-1b…’ is invalid or exceeds the maximum number of characters permitted.”

    Thanks,
    Jose

    1. Jose,

      Is that possible to modify the name of vsan datastore with a shorter name?
      if not, please get into the HCIBench console and modify the file:
      /opt/automation/lib/deploy-vms.rb

      please find and change the vdbench-vc-#{time_var}-#{datastore} to vdbench- #{datastore}.

      the #{datastore} pattern is used for identifying where the vms are to support multi-datastores deployment.

      Feel free to contact me directly by email if you have any further issues: VSANperformance@vmware.com

      Thanks,
      Chen

      1. Thanks Chen and Cormac!
        This is a great tool!
        I thought about changing the datastore name, but didn’t want to change the EVO:RAIL default setup.
        Changing the file “deploy-vms.rb” did the trick.

        I’ll let you know if I find any other issue.

        Jose

      2. Chen

        I tried to send you an email directly to VSANperformance@vmware.com but the message is being rejected.

        After the change, VMs were deployed correctly and workload executed, but the results show 0 in all the counters. Is there any reason why the results are 0 ?

        /results/results20151104101655/vdb-10vmdk-10ws-4k-70rdpct-0randompct-1446758335-res.txt

        Datastore: MARVIN-Virtual-SAN-Datastore-f990015d-8dc2-4869-ab88-1b04f3d3773f
        VMs = 0
        IOPS = 0.00 IO/s
        TPUT = 0.00 MB/s
        LAT = 0.00 ms
        =============================

        Thanks,
        Jose

        1. Jose,

          Could you verify if there are vdbench results files in the directory “/results/results20151104101655/vdb-10vmdk-10ws-4k-70rdpct-0randompct-1446758335”? If they are there, means the testing is finished successfully, could you let me know the results files name?

          Thanks,
          Chen

          1. Hi Chen,

            Yes, the reults files for each VM are there and they have the vdbench results. However, the *-res.txt” file for every workload has “0” for the metrics.

            Here

            /results/results20151107181133

            (this is the file that has 0 as results)
            results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193-res.txt

            results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/

            here one of the VM’s output:

            results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/vdbench-1446948711-1.txt

            Thanks,
            Jose

          2. Just add some info here, I did a test in a non-VSAN datastore, and the “-.res.txt” has the total values correct for each workload.
            Apparently the problem is when running workloads in a VSAN datastore.

          3. Hi Jose,

            the vdbench result file for each individual vm should be named as “vdbench-vc-TIMESTAMP-DATASTORE_NAME-[1…n].txt”, since the files you have doesn’t contain the Datastore_name, calculation script was not able to find those files.

            my suggestion:
            1. in the file /opt/automation/lib/deploy-vms.rb, you should have the #{datastore} showed up in where you modified last time.
            2. Modify the file name in /opt/output/results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193/ to vdbench-DATASTORE_NAME-[1…n].txt, and run
            “/opt/automation/vdb-process-long.sh /opt/output/results/results20151107181133/vdb-4vmdk-20ws-8k-55rdpct-80randompct-1446949193 DATASTORE_NAME”, if the result doesn’t look right, use /opt/automation/vdb-process-short.sh instead.

            let me know if you need further assistance.

            thanks,
            Chen

  10. I have a couple of questions, it woudl be great if you could help me with it

    Can we change the disk format of the vdbench test VMs. As of now it is think provisioned lazied zero , how can we make it as thin provisioned or think provisoned eager zeroes.

    How can we change the vdbench test VMs, CPU and memory. ? Do we have to edit /opt/output/vm-template/perf-ubuntu-vdbench.ovf

    1. Can we change the disk format of the vdbench test VMs. As of now it is think provisioned lazied zero , how can we make it as thin provisioned or think provisoned eager zeroes.

      Yes, you can use thin provision by comment out the line 871 of /root/rvc/lib/rvc/modules/vsantest/perf.rb

      How can we change the vdbench test VMs, CPU and memory. ? Do we have to edit /opt/output/vm-template/perf-ubuntu-vdbench.ovf

      Yes, you have to modify that file to change cpu and ram.

      Thanks,
      Chen

  11. Is it possible to change the storage policy being used for the worker VMs?

    We’re using VSAN 5.5 currently and the worker VMs don’t pick up any storage policy when they are deployed so they have the default settings (n+1, stripe width of 1 etc). We’d like to compare different settings such as stripe width so this would be a useful feature if possible.

    Cheers
    Andy

    1. I’m not sure if there is an easy way to do this. One way is to modify the default policy before each test, and then set it back afterwards. Unfortunately, this can only be done via esxcli in VSAN 5.5 iirc.KB 2073795 has information on how to do this.

      1. Thanks Cormac. I thought it would be along those lines but hadn’t seen that KB before so this will help me out if the test results mean we want to change the default policy.

        Doesn’t look to be a huge job to modify the default policy before each test though.

        Thanks again.

  12. Hi Everyone,
    I have successfully deployed the hcibench appliance. The test validation succeeds. However, when I start the test. I get one of two results.

    1. The test shows Completed immediately. Then I see the worker VMs deploy and sit at a OS not found screen.

    2. Test Starts. Deployment started begins and halts around 40% complete. The worker VMs are deployed from the template and power on. But no further progress is made on the test (I cancel after an hour or so)

    The documentation and this website reference VSANperformance@vmware.com as a potential contact for help with the tool, but any emails to that addressed are bounced.

  13. Hello,

    Iā€™m trying to use HCIbench and Iā€™ve followed the documentation- everything seems to be configured correctly, but when I click, ā€œtest,ā€ it creates all of the VMs, powers them on for a few seconds, powers them off, and then deletes them. The results look something like this:

    Datastore: vsanDatastore
    VMs = 0
    IOPS = 0.00 IO/s
    TPUT = 0.00 MB/s
    LAT = 0.00 ms
    =============================
    Any thoughts on what might be going on? Thank you.

    1. Check through the log files highlighted in the article Vincent. It should give you a bit of a clue as to why this is failing. You also need to have DHCP configured for the vdbench VMs to deploy – is this available?

Comments are closed.