Kubernetes, Hadoop, Persistent Volumes and vSAN

At VMworld 2018, one of the sessions I presented on was running Kubernetes on vSphere, and specifically using vSAN for persistent storage. In that presentation (which you can find here), I used Hadoop as a specific example, primarily because there are a number of moving parts to Hadoop. For example, there is the concept of Namenode and a Datanode. Put simply, a Namenode provides the lookup for blocks, whereas Datanodes store the actual blocks of data. Namenodes can be configured in a HA pair with a standby Namenode, but this requires a lot more configuration and resources, and introduces additional…

Getting started with Cloudera Hadoop on vSphere

This past week, my buddy Paudie and I have been neck-deep in Cloudera/Hadoop, with a view to getting it successfully deployed on vSphere. The purpose of this was solely a learning exercise, to try to understand what operational considerations need to be taking into account when running Hadoop on top vSphere. These operational considerations range from items such as maintenance mode, rack awareness, high availability, replication and protection of the data. Both Cloudera/Hadoop and vSphere offers ways to do all of this, so the longer term objective is to figure out whether or not these features are compatible, and whether…