New mclock I/O Scheduler in vSphere 5.5 – Some details

Cormac

10 years ago

My colleague Duncan wrote a post relatively recently around the new mclock I/O scheduler which VMware introduced in vSphere 5.5. He also mentioned some caveats with the new scheduler, especially around the I/O size (32K) used with the IOPS setting, which may lead to some unexpected behaviour. As Duncan mentioned, the reason for introducing the new scheduler is primarily to provide a better I/O scheduling mechanism that allows for limits, shares and reservations. Unfortunately, we didn’t do a very good job of announcing this change in I/O scheduling, or documenting the behaviour, and it has led to a number of additional questions from our customers. I hope to address some of these here.

Q1. Why was the I/O scheduler changed in vSphere 5.5?

A1. As mentioned, the new mclock I/O scheduler provides additional controls, such as reservations.

Q2. Why does the I/O scheduler IOPS value use 32K as an I/O size?

A2. As Duncan mentions in his post, a 4K I/O request is not the same as a 64K I/O request. There is a different cost associated with the service time of each I/O. When you think about disk I/O, there are two majors factors in the service time

Setting the disk head to the request offset
Transferring the data from that offset onwards

For small I/O requests, (1) is the dominating factor. For large requests, it is the transfer (2) that dominates. However, in an I/O scheduler, you need a way to account for these differences when enforcing shares, limits, and reservations. Hence, you pick a request size that would balance the costs of (1) and (2) as the scheduler’s normalization factor. This is why we picked the I/O request size to be 32K. This seems an acceptable I/O size for disks.

Q3. Are IOPS limits also considered for cloning and snapshot tasks?

A3. Yes.

Q4. How are IOPS limits handled with VAAI (vSphere APIs for Array Integration) turned on? Are the offloaded tasks (e.g. cloning) considered?

A4. Yes. I/Os handled by VAAI are accounted for in similar way as normal non-VAAI I/Os. Therefore the calculations of VAAI IOPS is also similar to non-VAAI mode IOPS.

Q5. Does the throughput/bandwidth controls (MB/s limit) behave like the IOPS limit in terms of a Storage vMotion operation, cloning etc.? Is this documented somewhere?

A5. The mclock scheduler in vSphere 5.5 doesn’t support bandwidth controls. See the vSphere 5.5U1 Release Notes for further information. In addition, there is a caveat with the mclock scheduler whereby the bandwidth and throughput limits are violated when both are configured for a SCSI virtual disk in the configuration file of a virtual machine. This caveat is documented in KB article 2059192. If you wish to use this functionality, you will need to revert to an earlier version of the scheduler as per the KB.

Q6. Why is Storage vMotion limited for powered-on VMs if they have an IOPS limit set (e.g. 160 IOPS) but if the VM is powered-off, and then cold migrated, the operation takes all the IOPS it can get (e.g. 300 IOPS)?

A6. This is intentional behavior and was done to respect the IOPS limit set for a given VM. Storage vMotion is designed to inherit the IO limits of whatever disk is being moved, or whatever VM is being moved. In a nutshell, we do not want We do not allow the datamover to violate whatever limits the customer has imposed on a given VM’s IO.