Condusiv V-locity 4 – New caching feature
I recently got hold of a copy of the new V-locity 4 product from Condusiv which was released last month. Condusiv is the new name for Diskeeper, whom you may have heard of before. I first came across them as a provider of software which specialized in optimizing I/O, primarily by preventing file fragmentation on NTFS in a Windows Guest OS. I blogged about them in the past on the vSphere Storage Blog after some discussions around defragmentation in the Guest OS. The new feature takes a portion of memory and uses it as a block cache. I did some preliminary tests with good ol’ IOmeter, and the initial results look quite good.
Disclaimer – the results shown here are for illustrative purposes only. I don’t have a production environment, this is simply using some equipment in my own lab. Nor am I in any way a performance guru. Condusiv have recommendations on how to correctly evaluate their V-locity 4 product which I will share with you shortly.
Test Environment
Two VMs running Windows 7, 1 vCPU, 2GB Memory. Each VM has a 32GB VMDK built on a local VMFS volume. The VMs are on dedicated ESXi 5.0 hosts with no other VMs running.
One of the VMs have Diskeeper V-locity 4 installed, the other does not.
Test 1 – IOMeter settings: 2 workers, 50,000 sectors, 1 outstanding I/O, 4KB, 100% Read, 0% Random.
IOmeter results from running above load on VM without V-locity:
This VM achieved about 15,500 read ops with 1 OIO. Now lets run the exact same test on the VM with V-locity 4 installed. The trick with V-locity & IOmeter is to let IOmeter run for a few minutes, stop it and allow V-locity’s algorithms to learn the data patterns, then restart IOmeter again. On the second start, performance improves dramatically.
IOmeter results from running above load on VM with V-locity:
With the same IOmeter settings, we have achieved twice as many read iops with the V-locity 4 product installed. Note that the % CPU is up at 100% now. The VM is now CPU bound rather than I/O bound. If we added more CPU resources to his VM, we would probably drive far more I/O.
Lets do another test, this time making half of the reads random. (BTW, I rebooted IOmeter before doing the next test – it can be a bit funky with its test results sometimes)
Test 2 – IOMeter settings: 2 workers, 50,000 sectors, 1 outstanding I/O, 4KB, 100% Read, 50% Random.
IOmeter results from running above load on VM without V-locity:
Not too different from the previous test. Let’s see the behaviour of the VM with V-locity and see whether random vs sequential has made much of a difference.
IOmeter results from running above load on VM with V-locity:
And again, very similar improvements observed.
Now, I am not going to go through all variations of IOmeter (actually, I’m not even sure this is the right tool for testing what is essentially cache). But from these very basic tests look, it would seem that the new V-locity 4 product is a VM accelerator of sorts. The benefits are cache are pretty self-explanatory. If reads can be satisfied from cache, this will obviously speed up performance. Also, if a good percentage of the I/O traffic comes from the cache, then there is more I/O bandwidth available to the underlying storage for I/Os that are not in cache.
Speaking with Spencer Allingham, the EMEA Technical Director for Condusiv, the proper way to evaluate the new features of this product would be to use the built-in Benefits Analyser. It runs over a 3 day period. The first day, it just monitors the Guest OS and provides no performance gain. The second day, V-locity would tune itself using the data that it had learned about on the first day, and on the third day, it provides the performance gains. What is really neat is that at the end of the third day, it will produce a report showing how much performance has been gained, and details how this has performance gain been achieved.
Spencer told me that the product has the ability to cache both proactively and reactively such that if it sees blocks being commonly accessed, it will ensure that they are loaded into the cache. In addition, it will use system monitoring to learn over time what blocks are used at certain times of the day, so that it can pre-load them ahead of time, so that they are ready for the time of the day when they are likely to be used.
One final item which Spencer mentioned is that aside from the caching, the IntelliWrite feature is still there. This feature is designed to help prevent Windows from splitting files up (fragmentation) as it is writing them to the NTFS volume. This in turn allows larger, more sequential I/Os, making I/O more efficient. However, in V-locity 4, Condusiv have made this feature smarter, and rather than wasting system resources by trying to aggregate all writes, V-locity 4 can now calculate the low performing fragment size, and thus it will only attempt to aggregate writes that would cause a performance loss if it is split up. If the file being written is split into large enough chunks so as not to cause a performance loss, then it will be left alone.
Sounds pretty good to me. You can get a free trial of V-locity 4 here.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan
Hi! Thanks for the entry, it looks very interesting. I have a couple of questions. You say: “The new feature takes a portion of memory and uses it as a block cache”. Can you decide how much memory you use? I guess it would then get better if the portion is greater.
Second question: does it work only in Windows?
Thanks and congrats for the blog, it is fantastic 🙂
Thanks for the questions Pablo. I didn’t see any way of tuning the amount of memory consumed for cache during the install. Also, I only saw a Windows version of the product on the Condusiv download site. Let me ask an expert from Condusiv to respond to the post with a definitive answer however.
Hi Cormac and Pablo,
Many thanks for asking me to come in on this. My apologies for not being able to get back to you before now. I am currently out of the office in California, but will be returning to the UK next week.
I’ll answer the second question first if I may. Yes, the product only works with Windows. You can install it in any Windows OS from XP up, including Windows 8 and Windows Server 2012, running in a virtual machine hosted by a VMware ESX/ESXi or Microsoft Hyper-V environment.
You can’t specify how much memory the cache will use, as that is determined by the software. However, we have very clever technology in place to ensure that the right amount of memory is used, and more importantly to hand memory back when other users and processes require more.
Our InvisiTasking technology has been put into the V-locity product, and this monitors the amount of CPU being used, the amount of memory being used, and the I/O bandwidth being used. As mentioned, if other users or processes require more computing resources, V-locity will throttle back so as not to impact those other users and processes that require more resource. Using this technology, V-locity 4 will only utilise otherwise idle resources, which means a zero footprint on resources as far as the other users and processes are concerned.
The cache will start out quite small, at about one eighth of the available free RAM inside the virtual machine. This will grow as we cache more and more data, up to half of the available free RAM on virtual machines running an x64 version of Windows. If we need to shrink the cache because Windows needs more memory for other processes, then that memory is handed back straight away.
With that in mind, the software is intelligent enough to take memory for the cache when available, and hand it back when needed. There should be no need to manually set cache sizes.
I hope that answers the question fully, but please do feel free to send me any further questions that you may have.
what type of disk /array did you use
Local storage on a HP DL380G7. The RAID controller was a Smart Array P410i. Not a SAN or NAS array, just local disk.
How will this software affect a SQL-server? SQL is known for using all available memory. Then there will be no more available for the V-locity = no gain?
Adding more memory to the vm will make more room for V-locity cache = better performance?
What type of production server will gain most performance?
That is definitely a question to direct to the Condusiv folks.
Dear Thomas & Cormac,
Please accept my apologies for not being able to post before now. I will try to answer each of your questions in turn:
How will this software affect a SQL-server?
V-locity software improves SQL Server performance significantly through its intelligent caching and writing technologies, however, I’ll focus on the caching technologies in this response. The intelligent caching software decides what information should be cached based on the access patterns, frequency of usage and a host of other parameters. V-locity will deliver superior performance results compared to SQL Server’s caching results because it takes into account more than just the data in the SQL databases.
SQL is known for using all available memory. Then there will be no more available for the V-locity = no gain?
Ideally, SQL Server should be configured such that V-locity will be able to use between 128MB and 32GB of physical RAM per VM, for intelligent caching. The user can also add extra memory to accommodate V-locity’s memory requirements if you desire, but it is not required. Upon installation V-locity will automatically configure the system without user intervention
Adding more memory to the vm will make more room for V-locity cache = better performance?
That is correct. Or, you can reserve enough memory for V-locity as described in the previous answer
What type of production server will gain most performance?
Servers running highly I/O intensive applications such as Database, MS Exchange, video or any business application with database at the backend will see immediate benefits from V-locity.
I hope that this helps. Please do let me know if I can be of any further help.
Best regards,
Spencer Allingham
Technical Director
Condusiv Technologies EMEA.
How does this work with memory ballooning? If we are over allocating on our cluster with the assumption that not all memory will be used, all at the same time. How will v-locity affect the performance under contention? How is this viewed from the vCenter console and is it hypervisor “aware”?
For instance, if the balloon driver is attempting to reclaim memory for other VM’s, will it see this in memory cache as being used and not reclaimable? If I have this running in all my VM’s and v-locity is not hypervisor aware…will nothing be reclaimable since one VM doesn’t know that the other VM’s are using this in memory cache…it would appear to be used? Would I see this in memory cache as being used from vCenter or vCOPS? If so, how does this change capacity planning?
All valid questions which should be directed to Condusiv if you are considering evaluating/using this product.
Thanks. I did pose these questions to them. The answer was that it is not hypervisor aware. That was as much detail as I could get. This would tell me that this could be a very dangerous product to run in clusters where contention may happen. If all my VM’s are running these in memory caches…you might as well set reservations on everything too since it can’t be reclaimed.
Dear Troy,
I would like to thank you for your interest in V-locity 4. How this works with VMware’s memory ballooning is an interesting question, and hopefully I can go some way to allaying your fears about this.
Firstly, you are correct that V-locity 4 is not hypervisor or storage aware. It doesn’t need to be, as all of the I/O traffic optimisation is being done inside the guest operating system. As far as the hypervisor, physical host and back end storage are concerned, they will receive larger, more sequential I/O packets as a result of the optimisation that is being done, and will receive a lighter load as a good percentage of the I/O traffic is being satisfied from the IntelliMemory cache, INSIDE the virtual machine.
As far as VMware ballooning is concerned, V-locity will only use up to half of what the operating system sees as available free (physical) RAM. Not all, only up to half. This leaves some memory available for memory ballooning immediately. In addition, the IntelliMemory cache size is dynamic. This means that if more RAM is required by other processes or applications, the IntelliMemory cache will automatically shrink, to zero if required so that even if VMware ballooning is taking place, this would cause V-locity to free up some of the cache space so that V-locity can never be the cause of a memory starvation situation. Of course, if the cache shrinks, this will likely cause less of a performance gain than you would otherwise see.
Of course, in order to get the full performance benefit of the IntelliMemory caching, it makes sense to have some available free RAM inside the guest operating system for the cache to establish.
With regard to clustering, clustering at the hypervisor level is fully supported. So, if you have a VMware HA cluster for example, that is fine. However, if you are clustering at the guest operating system level, that is not currently supported and the IntelliMemory caching feature would be turned off automatically. All of the other features of V-locity would remain active, so this type of environment that uses the old 32 bit XP kernel would still see some benefit from the way that file writes are aggregated, resulting in larger, more sequential I/Os travelling down the storage stack to the SAN or other back end storage. This would still be a more efficient way of writing data out so that it can be striped across the spindles in the SAN more efficiently.
In the near term, we will be delivering a new release that supports Active/Passive clustering at a guest operating system level. Active/Active clustering will follow after that.
I hope that answers all of your questions Troy. Please do feel free to come back to me if I can be of any further assistance.
Best regards,
Spencer Allingham | EMEA TECHNICAL DIRECTOR
This is good but how realistic are the results? I beleive that the issue with defrag revolves around SAN and NAS storage configurations. I really don’t see how a GOS based utility can do anything to optimize storage that is being managed by a remote disk subsystem which it is totally oblivious to.
Hi Dennis,
Apologies for my late reply.
The results are very realistic. I don’t want to get too ‘salesy’ here, but the problems you refer to at the SAN/NAS layer can easily be traced back to the way that Windows writes files in the first place. By having the Windows Write Driver write files without splitting them up, you get larger, more sequential I/Os travelling down to the SAN which are more efficient to stripe across their spindles, and are more efficient to read back in again when required. If the SAN is dealing with data in larger chunks, physical read/write head seeks are reduced, a major factor in storage latency for spinning disks.
As far as proving it is concerned, V-locity has a built in Benefit Analyzer that will run over three days in a live environment or in a test lab, which will give you a ‘before and after’ report showing you things like:
-Workload Throughput
How much more I/O traffic did the machine process when V-locity was active?
-I/O Response Times
If V-locity can satisfy a good percentage of the read I/O traffic from RAM cache, that should bring the average I/O response time down.
-Normalised IOPS
-Number of I/Os that had to go out to disk
-Average size of each I/O
etc, etc.
As I said, this isn’t the place for me to get too ‘salesy’, but to be honest, I don’t want you to take my word for it, the best thing is for you to try it for yourself.
I hope that helps. Please do get in touch if you would like to discuss this further.
Best regards,
Spencer Allingham
EMEA Technical Director
Condusiv Technologies.