A closer look at Runecast

Last week, I had the pleasure of catching up with a new startup called Runecast. These guys are doing something that is very close to my heart. As systems become more and more complex, and with fewer people taking on more responsibility, highlighting potential issues, and providing descriptive guidance to resolving an issue is now critical. This is something that is resonating in the world of HCI, hyper-converged infrastructure, where the vSphere administrator may also be the storage administrator, and perhaps the network administrator too. This is where Runecast come in. Using a myriad of resources such as VMware’s Knowledgebase system, Security Hardening Guide, various Best Practices, and other assorted information, Runecast can monitor your vSphere infrastructure and bring to your attention the need for some remediation. This could be because something in the logs matched an issue reported in a KB, or new hosts have been added to a cluster which have not been security hardened, or that VMware has released a new patch or update, and it is relevant to your environment.

Setup

The Runecast product comes as a virtual appliance (OVA). Simply deploy it in your infrastructure and connect it to your vCenter Servers. The appliance needs 2 vCPUs and 6GB RAM. The latest appliance, version 1.5 (which released today incidentally), has 2 x VMDKs (40GB). This is primarily to facilitate a new feature in v1.5 which allows the appliance to gather logs and provide reporting on multiple vCenter Servers. Once logged in, the appliance can be setup to monitor ESXi hosts as well as VMs. For ESXi hosts, the syslog is redirected to the appliance, and the appropriate firewall rules are configured. Virtual Machine log output can also be redirected to the appliance. This is done by adding an entry to the VM’s .vmx, and then the VM needs to be powered cycled or migrated for the update to take effect. The appliance only needs internet access to download new updates. However, updates can be provided in other ways for sites that do not have access to the outside world. This diagram provides a basic overview of the architecture.

We were told that Runecast are currently making new updates every 2 weeks on average, but if there is a critical update from VMware, they will push this to their users quicker than that.

Demo

I must say that the demo was very intuitive. After about 30 minutes, I felt like I would be able to drive this very easily myself. Stanimir Markov demonstrated the product to us and during the demo, we saw examples of issues related to security hardening being highlighted, missing best practices, as well as alerts being generated because the log analysis caught something  that was highlighted in a VMware Knowledgebase article. I then deployed it myself in my own lab, and had it running in a matter of minutes. Probably the easiest way to get a feel for it is via some of the screenshots that I took. Here is a nice one which highlights whether a bunch of different best practices have passed or failed, and also how many objects in the inventory are impacted by this check.

This is another nice one – KBs discovered. This is highlighting whether or not the criteria in a certain knowledgebase article is applicable in this environment, and again how many objects are affected. Not only that, but each of those objects can then be queried to get even further detail, such as the ESXi host below. You’ll also notice that event of the alerts/warnings comes with a severity level, which the Runecast team determine based on the impact the “known issue” can have on your environment. As you can see below, this one could cause a PSOD (Purple Screen of Death), so it is categorized as “Critical” from an availability perspective.

The other nice feature is that you can read the KB articles via the appliance interface. There is no need to connect to the VMware KB site to review the content. Very useful again for those sites that do not have internet access.

I’ll just add one more screenshot that I thought was interesting. This was from the security hardening view. I know that this was a very important category for a lot of customers, but it requires a lot of due diligence to make sure it was implemented correctly. This is especially true in HCI, where you might be regularly scaling out the HCI system by adding new hosts to the cluster. Manually making sure all the security hardening is in place can be tedious. With Runecast, you can verify that the security hardening changes have indeed been implemented:

While I haven’t been able to touch on all aspects of the Runecast interface in this post, I was very impressed by its simplicity, and ease-of-use. Compared to a lot of other interfaces, it seemed very intuitive to use, with no steep learning-curve needed. Other items that impressed me were the ability to get an inventory view, and see how many alerts are associated with each host, or VM, or datastore, or network, etc. I also liked the filtering mechanism, where some alerts could be ignored temporarily or permanently, perhaps during a planned maintenance period.

One limitation is around remote alerting. Right now this is only available via email, but the Runecast team are working on additional notification mechanisms, such as SNMP traps and web hooks for applications such as Slack, etc. This is feedback that they have heard from many of their customers.

About Runecast

Runecast are currently up at 10 full time employees, with another 4 part time employees. The majority of the development work is being carried out in the Czech Republic, and they have a presence in many other countries. Runecast offer a free 30 day trial of their product, and I also believe the VMware vExperts have an access to an NFR license. Licensing is based on an annual subscription rate, which I understood to be $250 per CPU per year. Runecast Analyzer can be downloaded here.

I must say that I was impressed by this product. Like I said at the beginning of this post, as HCI becomes more prevalent, the onus will be on fewer people to manage more of the infrastructure. Those people are invariably the vSphere admins, and tooling will be critical in reducing troubleshooting time. The next step will not just be proactive highlighting of potential issues, but prescriptive guidance and remediation in the event of a failure.

Runecast are participating in a number of VMware User Group meetings globally this year, and they will also be at VMworld. Go check them out if you see them.