SAS Expander support on Virtual SAN
This question has come up on a number of occasions in the past. It usually comes up when there is a question about scalability and the number of disk drives that can be supported on a single host that is participating in Virtual SAN. The configurations maximums for a Virtual SAN node is 5 disk groups, with each disk group containing 1 flash device and up to 7 capacity devices (these capacity devices are magnetic disks in hybrid configurations or flash devices in all-flash configurations). Now the inevitable next question is how is this configuration implemented on a physical server. How can I get to 35/40 devices in a single server. There are a few ways to do it.
Multiple storage controllers
This is probably the simplest approach. Most storage controllers will support either 8 or 16 disk drives. If you have a controller that supports 8 disk drives, then you’re looking at 5 controllers to reach the maximum number of physical disks per VSAN hosts. If you have a controller that supports 16 disks drives, the you would need 3 controllers. This configuration isn’t always possible, simply due to the number of available PCIe slots for controllers.
SAS expanders
SAS expanders are yet another option. Let’s describe a SAS expander first. Typically it is a PCIe device that plugs into the PCIe bus of the server, and is connected to an external JBOD (Just a Bunch of Disk) or external storage controllers which contains a number of storage devices. Communication travels from the storage controller thru the SAS expander to the disk device.
We’ve had to do some work at VMware to qualify the SAS expanders as we didn’t know if they would be suitable for Virtual SAN Environments. We put this note in our VSAN 6.0 Design and Sizing Guide:
“SAS expanders are sometimes considered to extend the number of storage devices that can be configured with a single storage I/O controller. VMware has not extensively tested SAS expanders with VSAN, and thus does not encourage their use. In addition to potential compatibility issues, the use of SAS expanders may impact performance and increase the impact of a failed disk group.”
Well, that was a number of months ago, and now we have done the first certification of these devices. The DELL R730XD is now fully certified with SAS expanders. The R730XD has “built-in” SAS expanders (it’s not an optional add on). This means that you can populate 24 drives behind a controller without any additional hardware.
However it is important to remember that having a fully populated SAS expander configuration with all drives behind a single controller may not provide as good a performance when compared to using multiple controllers. Configurations of this nature can be considered for workloads that are capacity intensive and may not require the highest level of performance. If performance is an important factor, then it may be worth considering adding additional controllers. Again, this is dependent on the requirements of the workload.
Designing & sizing a VSAN deployment
When designing a VSAN deployment, you must consider the capacity requirement. Does it require more capacity per host than a single controller can handle? If so, what options are available; multiple controllers or SAS expander – and more importantly, is optimal capacity more important than optimal performance? If SAS expander is the choice your are going with, make sure the host in question is qualified.
[Update] I was asked where I can find the list of SAS expanders that are currently supported. At present, SAS expanders are only supported with select VSAN ready nodes. You can get a list of VSAN ready nodes, and thus which SAS expanders are supported from clicking on this link.
Cormac, I wonder if VMware just fundamentally doesn’t understand how storage hardware works, or if it’s more than that.
Expanders are not a problem.
The truth is that there are no SAS devices on the market, solid state or otherwise, capable of stuffing a 12Gbit full-duplex SAS port. This means that in order to fully utilize the bandwidth available in (for example) the 8-port Avago/LSI SAS3008, which is about 70% of the worldwide market for Server-RAID cards, you MUST use an expander to maximize the performance of the architecture.
Where the expander can become a bottleneck is in the almost-never-seen combination of (a) 100% large-block IO requests and (b) enough expander-attached total devices to reliably deliver more bandwidth than the upstream SAS controller can handle.
How many ultra-fast SSDs is “enough to stuff” the pipes? Using the fastest 12Gbit SAS SSDs on the planet, you need 16-24 of them per SAS3008 to fully utilize the 9.6GBytes/sec >>full duplex<>single<< 8-port SAS3008 chip, or six physical disks per port.
Obviously then, SAS expanders are not a problem, they are necessary.
VSAN, and to an even greater extent EVO:Rail seem to reflect VMware's profound misunderstanding of computer and storage architectural concepts. Telling people to limit themselves to 8 SAS devices per PCIe slot RAID controller fills up PCIe slots with controllers that can't ever utilize more than a fraction of the bandwidth they are capable of.
VMware would do well to follow the example of the market leader here (HP's StoreVirtual VSA) and get out of the business of hardware-defined-software-defined storage.
VSAN should not need a separate HCL or special rules about what hardware is allowed or not allowed. IMO, this is why EVO:Rail is tanking and why VSAN adoption is barely moving the needle in the market. VMware's propensity for creating arcane or arbitrary hardware rules that are technically unjustifiable, yet somehow (just coincidentally) maximize per-socket VMware license revenues is hurting them. It smells like 'evil'.
-Chasmcrosser
As per my disclaimer on this blog, I am speaking for myself, not VMware. This is my observation from working on VSAN for the past 2 years.
In a perfect world, I would agree with you. This was our initial goal with VSAN; use any components to build yourself a distributed storage solution. I am sure we would love nothing more than to just give our customers the VSAN software, and let them deploy it on whatever combination of host, controller, flash device that they want. In reality, this is simply not possible. We have found that there are too many inter-dependencies (and nuances in behaviour) between controllers, drivers, driver firmware, magnetic disks, SSDs, PCI-e flash devices, flash device firmware, for this to happen. Stuff that is just supposed to work, but doesn’t. This is exactly why we started to qualify SAS expanders (and flash devices and driver and firmware versions). Its not that we’re trying to be difficult, its because we have encountered situations where these components “do something funky” and we want to protect our current customers (and future customers) from hitting these issues if they decide to roll out a VSAN solution. Maintaining a HCL is the only way we can offer our customers hardware choice while still ensuring that the components have been rigorously tested.
Hope that makes sense.