Tech Preview of EMC’s XtremIO Flash Storage Solution

There is no doubt that Flash is hot right now. Over the past number of months, we have seen IBM acquire Texas Memory Systems (TMS), HDS unveil their own flash strategy and HP launch their all flash 3PAR P1000 array. Of course regular readers of my blog will have seen my posts about newer all flash array vendors such as Pure Storage, Violin Memory & Nimbus Data. The purpose of this post is to highlight XtremIO’s flash storage solution which was recently acquired by EMC.

I should point out that there is no XtremIO product available for purchase just yet. My understanding is that EMC hope to go GA with it sometime next year (but please check with EMC directly). Don’t go looking for XtremIO on the VMware HCL – you won’t find it.

At VMworld 2012 in Barcelona, I had the pleasure of meeting & chatting with Josh Goldstein of XtremIO. Josh was kind enough to give me a preview of XtremIO’s technology. He told me that XtremIO provide an all-flash storage array based on scale-out building blocks called “X-bricks” which cluster together to meet customer demands. The interconnect between X-bricks is Infiniband. The LUNs are presented to ESXi hosts over iSCSI or Fibre Channel.

The “X-brick” comes in a configuration of two stand-alone controllers with a JBOD of 2.5″ form factor MLC flash SSD drives. It has automatic load balancing across X-bricks, which Josh said provided a consistent low latency as well as linear performance scalability as you add additional X-bricks. XtremIO’s core engine, implemented 100% in software, does inline deduplication of the data (4KB granular) and works across all of the volumes on all of the X-bricks in a cluster. XtremIO claim that they can achieve between a 5:1 and a 30:1 data reduction in virtualized environments. The added benefit of this inline, global deduplication is that it reduces the number of flash writes, reducing wear  on the SSDs.

All volumes on the X-brick are protected with a flash optimized configuration similar to RAID-6, but XtremIO has patented algorithms to minimize reads and writes as well as providing flash longevity and improving performance. It should be noted that this is something which is automatically done, and does not require configuration by the administrator. All volumes are also all thinly provisioned.

Another key point is that the X-brick has no single point of failure. The cluster’s system manager function handles failures, and can restart various software components or fail them over.

Snapshots/Clones

Snapshots and clones are done at the LUN level, and work at a 4KB grain size. XtremIO claim that their snapshots and clones have no impact on performance, and can run as well as their LUNs. However, considering EMC participation in the Virtual Volumes (VVOLs) program, one suspects that granularity will move to the virtual machine or VMDK level at some point in the future.

Replication/SRM

There is no in-built replication mechanism on the X-Brick at this time, but VMs running on the XtremIO X-brick can of course be replicated using VMware’s own vSphere Replication Product. Although Josh could not go into specifics or roadmap details, he did state that a native replication feature is a high priority item for them.

Management plugin to vCenter

Management is currently done via a web-based UI and Command Line Interface at the moment. A single admin UI screen allows you to monitor capacity, performance, alerts and hardware status. Integration with the vSphere UI is something XtremIO are currently looking at. As you can see, the UI is very simplified and at a single glance you can get an overall view of the health and performance of the X-brick cluster (unfortunately, the storage was idle at the time that this screen shot was captured, but hopefully you get the idea).

XtremIO X-brick UI

I asked Josh what were the features of the XtremIO X-brick that he believes make it stand out from the other flash array vendors on the market currently. These were the items he highlighted as being differentiators.

1. XtremIO’s dedupe is truly real-time & inline at all times. It is not semi-inline (sometimes switched off for future post-processing when the array gets busy) nor a post-processing design.  This has the benefit of reducing the number of writes seen by the SSDs, which both increases flash endurance and delivers better performance, since I/O cycles on the SSDs remain available for writing unique data.

2. XtremIO’s VAAI XCOPY is greatly enhanced by having real-time inline deduplication and by the fact that the XtremIO array always maintains its metadata tables in memory, rather than having to perform lookups on disk.  Imagine what happens when an administrator clones a VM.  VAAI tells the XtremIO array to copy (using the XCOPY command) the range of blocks corresponding to that VM from one location in the array to a new location.  Since the VM already exists on the array (and thus no new unique blocks are being written), all the XtremIO system has to do is update its metadata tables that a new reference to those blocks exists.  With all the metadata in RAM, the operation can be completed practically instantaneously.  This gives administrators tremendous power and flexibility to roll out VMs on-demand without incurring a high I/O penalty on the storage array.  XtremIO have a video demonstration of this capability on their website here.

3. From a vSphere perspective you can get the full benefits of eager thick zero provisioning all the time since the volumes are always thin provisioned at the back-end.  And since the XtremIO array supports the VAAI zero blocks/write-same primitive and has special internal handling of zero blocks, there is no drawback in provisioning or formatting time for eager thick zero volumes.

4. With its deduplication technology, XtremIO also make Full Clones attractive and cost-effective to run in all-flash, especially for VDI where storage can sometimes be cost prohibitive. This benefit is not exclusive to Full Clones – the XtremIO array works equally well with Linked Clones or any combination of Full and Linked Clones.  An interesting report published jointly between VMware and XtremIO about VDI testing results can be found here. There is also an interesting VDI demonstration video here.

5. When you take all of these things together (advantages of inline dedupe, VAAI, eager zero thick all the time, no RAID configuration on the array), Josh stated that one could become a “lazy administrator”. All of the labour intensive operations that required administrators full attention are now taken care of by the array. And since everything is guaranteed to run at the same performance level (LUNs, snapshots, clones), there is no performance management necessary at the array level. The configuration steps are very simple – you create the volumes, create initiator groups and map volumes to the initiators. Very simplistic.

The X-Brick also comes with a complete set of CLI commands for management and monitoring as XtremIO realise that a lot of administrators like to work at the command line level for scripting and automation. This is nice to see.

My colleague Andre does a really nice in-depth review of the X-brick in a VMware View POC here.

This flash storage solution certainly has a lot of neat features. What with the number of storage vendors who are now embracing flash, and the number of new flash-centric storage vendors on the scene, 2013 should be an interesting year in storage.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage

Heads Up! vSphere 5.1 & EMC Storage

In this post, I want to call out two important matters related to the vSphere 5.1 release & EMC storage. The first is related to Round Robin Path Policy changes, and the second relates to a VMFS5 volume creation issue.

Round Robin Path Policy

Without delving too deeply into the Pluggable Storage Architecture found in ESXi hosts, VMware uses a Native Multipath Plugin for handling I/O between the host and a storage array. This has two important components – a Storage Array Type Plugin (SATP) for failover and Path Selection Policy (PSP) for load balancing. Each SATP has a default PSP associated with it. An esxcli storage nmp satp list will show you the relationship between SATP and its default PSP.

EMC have taken a very important step with the release of vSphere 5.1. My understanding is that a large portion of EMC storage is now going to use VMware’s Round Robin Path Selection Policy (PSP) by default.  Below is the output taken from a LUN from a VNX-5500 array presented to one of my ESXi 5.1 hosts. As you can clearly see, this is now using Round Robin without having to make any configuration changes.

naa.xxx
 Device Display Name: DGC Fibre Channel Disk (naa.xxx)
 Storage Array Type: VMW_SATP_ALUA_CX
 Storage Array Type Device Config: {navireg=on, ipfilter=on}
 {implicit_support=on; explicit_support=on; explicit_allow=on;
 alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
 Path Selection Policy: VMW_PSP_RR
 Path Selection Policy Device Config: {policy=rr,iops=1000,
  bytes=10485760,useANO=0;lastPathIndex=2: NumIOsPending=0,
  numBytesPending=0}
 Path Selection Policy Device Custom Config:
 Working Paths: vmhba2:C0:T3:L0, vmhba4:C0:T3:L0
 Is Local SAS Device: false
 Is Boot USB Device: false

In vSphere 5.1, the default PSPs for the Storage Array Type Plugins (SATPs) VMW_SATP_ALUA_CX and VMW_SATP_SYMM have changed from   VMW_PSP_FIXED  to  VMW_PSP_RR:

Using the command esxcli storage nmp satp list, we can see this change:

Name              Default PSP   Description
----------------  ------------  ---------------------------------
VMW_SATP_ALUA_CX  VMW_PSP_RR    Supports EMC CX that use the ALUA..
VMW_SATP_SYMM     VMW_PSP_RR    Placeholder (plugin not loaded)

I think that this is indeed a great move. I believe you’ll get optimal storage performance with the RR PSP. However if you use Microsoft Cluster Services (MSCS) you are probably aware that you cannot use the Round Robin path selection policy on the back-end storage. Without getting into too much detail, handling SCSI Reservations across multiple paths is the reason behind not supporting this. Therefore, if you use EMC storage, and you use virtualized MSCS environments, and you plan to upgrade to vSphere 5.1, keep this in mind. You will have to change the VMW_PSP_RR for those devices back to the original setting. There is plenty more information around MSCS supportability in vSphere environments in this KB article.

VMFS5 Volume Create on EMC VMAX/VMAXe & ATS

This second issue comes straight from the vSphere 5.1 Release Notes. If your ESXi host is connected to a VMAX/VMAXe array, you might not be able to create a VMFS5 datastore on a LUN presented from the array. If this is the case, the following error will appear: An error occurred during host configuration. The error is a result of the ATS (VAAI) portion of the Symmetrix Enginuity Microcode (VMAX 5875.x) preventing a new datastore on a previously unwritten LUN.

The workaround is to disable Hardware Accelerated Locking on the ESXi host, create the VMFS5 datastore and then re-enable Hardware Accelerated Locking on the host. Chad Sakac from EMC has further information about the issue and the solution on his blog post here.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage

EMC Isilon – OneFS Mavericks Release Overview

EMC IsilonEMC Isilon are providing even further vSphere integration features in their upcoming ‘Mavericks’ release of their OneFS operating system. This is great to see. The integration is in the area of vSphere APIs, both for Array Integration (VAAI) &  Storage Awareness (VASA).

Let’s have a look at the VAAI enhancements first.

1. VAAI NAS integration

  • Full File Clone/NFS File Copy – The Full File Clone primitive calls the storage array’s replication facility. In Isilon’s case, a writeable snapshot of the file is created, saving space since it does not need to clone the whole VM’s disk. This is very similar to the VAAI block primitive XCOPY. One difference I do need to call out however between block and NAS primitives is that the NAS Full File Clone primitive will only work with VMs that are not running. In other words, Storage vMotion operations do not use the Full File Clone primitive at this time, unlike Storage vMotion on block devices which support VAAI. I want to highlight that this is not a limitation in Isilon’s implementation; rather it is a limitation on the vSphere side. It is definitely something I want to see in a future implementation of VAAI.
  • NFS Extended Stats – With NFS, vSphere only gets generic information about space consumption on Thin Provisioned datastores. The full details around the amount of space that is being consumed by an actual file on an NFS datastore at the back-end is not visible. This can lead to some space-management administration overhead as vSphere administrators may need to contact the storage admin for detailed information. In vSphere 5, all extended file and filesystem information are available via this primitive. For example, how much actual space is being consumed by a VMDK on the back-end can now be retrieved.
  • NFS Reserve Space – In earlier versions of vSphere, there was no way for NFS datastores to create the equivalent of an “eager-zeroed thick” VMDK. In vSphere 5, with VAAI NAS support, you now have the ability to reserve the entire space for a VMDK on an NFS datastore with this Reserve Space primitive.

These primitives, of course, require the EMC Isilon VAAI NAS plugin, but this is easily installed via VUM, the VMware Update Manager. After watching some of the tests, the improvement is significant. An offline clone operation of a 120GB VM took about 7 minutes 15 seconds without VAAI. With VAAI, it took 1 minute and 29 seconds. This was almost 5 times faster. Nice!

2. VASA

vSphere Storage APIs for Storage Awareness, commonly referred to as VASA, is a set of APIs that permits storage arrays to integrate with vCenter for management functionality.

Isilon are now surfacing up a bunch of device capabilities with VASA. These are now visible in the vSphere client when examining datastores.

Capability

Description

ARCHIVE Datastore resides on Isilon NL-series hardware
CAPACITY Datastore resides on Isilon X-Series hardware
HYBRID Datastore resides on a mixed Isilon hardware configuration
INVALID Datastore resides on a mixed Isilon hardware configuration
PERFORMANCE Datastore resides on Isilon S-Series hardware or SSD accelerated storage
ULTRA_PERFORMANCE Datastore resides on Isilon S-Series hardware with SSD acceleration
UNKNOWN The Storage Capability for this object is Unknown

This is great to see. Isilon customers who deploy the VASA plugin along with upgrading to the Mavericks release can now reap the full benefits of VMware’s Profile Driven Storage feature. What this means is that deployments of VMs will always be error free, allowing you to select the correct datastore for your VM each & every time. The other benefit is that you can constantly check the compliance state of your VMs storage throughout its life-cycle (e.g. detect if someone inadvertently migrated to a lower tier of backing storage). You can learn more about Storage Profile but this blog post I did on the vSphere Storage Blog.

We don’t have enough vendors doing offloading with VAAI NAS, so it is a welcome sign to see Isilon introduce this. And I certainly like the VASA capability descriptions that they are surfacing up – I think this make it nice and clear to Isilon customers what sort of device(s) are backing their respective datastores.

EMC are a diamond sponsor at this years VMworld 2012 in San Francisco. I’m sure Jay, James and the rest of the Isilon team would be delighted to show you these new features. You’ll find those guys at booth 1203.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage