A change to sub-blocks on VMFS-6
Something that I only just recently noticed is that we have made a change to the sub-blocks structure on VMFS-6, compared to VMFS-5. Sub-blocks are small allocations on a VMFS volume, and they are used to back small files. They were introduced as a space-saving measure to prevent using a full file block to back a very small file. To put this simply, when a file is created on VMFS, it is initially backed by a sub-block, and when the file grows above the size of a sub-block, it is switched to being backed by a file block (this has now changed with VMFS-6, as per a previous article on small file blocks and large file blocks). Anyway, the crux of this change is that the sub-block size has now returned to 64K in size in VMFS-6, compared to the 8K sub-block size used in VMFS-5 (in fact, we used to have a 64K sub-block back in the VMFS-3 days). Let’s have a look at this more closely by comparing a 500GB VMFS-6 volume with a 500GB VMFS-5 volume, both created on vSphere 6.5.
Let’s start with the VMFS-5. Let’s look at the metadata to see how many sub-blocks there are, and then lets examine the system resource files to figure out how many there are:
[root@esxi-dell-e:~] vmkfstools -P -v10 /vmfs/volumes/5981ca0f-8d7fa5f9-1616-246e962f4910/ VMFS-5.81 (Raw Major Version: 14) file system spanning 1 partitions. File system label (if any): pure-big-vmfs5 Mode: public ATS-only Capacity 536602476544 (511744 file blocks * 1048576), 535578017792 (510767 blocks) avail, max supported file size 69201586814976 Volume Creation Time: Wed Aug 2 12:48:15 2017 Files (max/free): 130000/129976 Ptr Blocks (max/free): 64512/64496 Sub Blocks (max/free): 32000/31993 Secondary Ptr Blocks (max/free): 256/256 File Blocks (overcommit/used/overcommit %): 0/977/0 Ptr Blocks (overcommit/used/overcommit %): 0/16/0 Sub Blocks (overcommit/used/overcommit %): 0/7/0 Volume Metadata size: 806256640 UUID: 5981ca0f-8d7fa5f9-1616-246e962f4910 Logical device: 5981ca0f-450fd97a-4fcf-246e962f4910 Partitions spanned (on "lvm"): naa.624a9370d4d78052ea564a7e00011138:1 Is Native Snapshot Capable: YES OBJLIB-LIB: ObjLib cleanup done. WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0 [root@esxi-dell-e:~] ls -al /vmfs/volumes/5981ca0f-8d7fa5f9-1616-246e962f4910/ total 791560 drwxr-xr-t 1 root root 1400 Aug 3 14:03 . drwxr-xr-x 1 root root 512 Aug 16 08:45 .. -r-------- 1 root root 2686976 Aug 2 12:48 .fbb.sf -r-------- 1 root root 267026432 Aug 2 12:48 .fdc.sf -r-------- 1 root root 1179648 Aug 2 12:48 .pb2.sf -r-------- 1 root root 268435456 Aug 2 12:48 .pbc.sf -r-------- 1 root root 262733824 Aug 2 12:48 .sbc.sf drwx------ 1 root root 280 Aug 2 12:48 .sdd.sf drwx------ 1 root root 560 Aug 8 12:46 .vSphere-HA -r-------- 1 root root 4194304 Aug 2 12:48 .vh.sf
On this VMFS-5 volume, there is a total of 32000 sub-blocks created. However, if we examine the hidden system resource files in the volume, we see around 256MB of space set aside for sub-blocks. 32000 x 8K gives us the 256MB figure.
Lets now examine a similarly sized VMFS-6 volumes:
[root@esxi-dell-e:~] vmkfstools -P -v10 /vmfs/volumes/5981ca3d-a9cb2e29-540a-246e962f4910/ VMFS-6.81 (Raw Major Version: 24) file system spanning 1 partitions. File system label (if any): pure-big-vmfs6 Mode: public ATS-only Capacity 536602476544 (511744 file blocks * 1048576), 305317019648 (291173 blocks) avail, max supported file size 70368744177664 Volume Creation Time: Wed Aug 2 12:49:01 2017 Files (max/free): 16384/16253 Ptr Blocks (max/free): 0/0 Sub Blocks (max/free): 16384/16311 Secondary Ptr Blocks (max/free): 256/255 File Blocks (overcommit/used/overcommit %): 0/220571/0 Ptr Blocks (overcommit/used/overcommit %): 0/0/0 Sub Blocks (overcommit/used/overcommit %): 0/73/0 Large File Blocks (total/used/file block clusters): 1000/152/360 Volume Metadata size: 1510866944 UUID: 5981ca3d-a9cb2e29-540a-246e962f4910 Logical device: 5981ca3c-95c4fb10-05c1-246e962f4910 Partitions spanned (on "lvm"): naa.624a9370d4d78052ea564a7e00011139:1 Is Native Snapshot Capable: NO OBJLIB-LIB: ObjLib cleanup done. WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0 [root@esxi-dell-e:~] ls -al /vmfs/volumes/5981ca3d-a9cb2e29-540a-246e962f4910/ total 1483008 drwxr-xr-t 1 root root 77824 Aug 15 14:06 . drwxr-xr-x 1 root root 512 Aug 16 08:51 .. -r-------- 1 root root 8781824 Aug 2 12:49 .fbb.sf -r-------- 1 root root 134807552 Aug 2 12:49 .fdc.sf -r-------- 1 root root 268632064 Aug 2 12:49 .jbc.sf -r-------- 1 root root 16908288 Aug 2 12:49 .pb2.sf -r-------- 1 root root 65536 Aug 2 12:49 .pbc.sf -r-------- 1 root root 1074331648 Aug 2 12:49 .sbc.sf drwx------ 1 root root 69632 Aug 2 12:49 .sdd.sf drwx------ 1 root root 73728 Aug 8 12:46 .vSphere-HA -r-------- 1 root root 7340032 Aug 2 12:49 .vh.sf ...
At first glance, it might appear that there are less sub-blocks on VMFS-6. However, VMFS-6 has a new dynamic system resource allocation mechanism, which means that more sub-blocks (and other resources) can be created as needed. With that in mind, we can see that there are currently 16384 sub-blocks, and a system resource file size of 1GB. This implies that the sub-block size on VMFS-6 is 64K. Note that it is not just sub-blocks that have this new dynamic system resource allocation mechanism, but other resources are allocated dynamically in VMFS-6 too.
Why did we make this change? Well, I don’t know all of the reasons, but one of the reasons in that sub-blocks are now used in place of pointer blocks on VMFS-6 to address very large files. Note that the number of pointer blocks reported in the vmkfstools for VMFS-6 now returns 0. That is because they have been replaced with the new sub-blocks. You can also see that the pointer block system resource file is now really small (a single 64K sub-block?) in the system resource file list, when compared to the size on VMFS-5.
Find out more about VMFS-6 and other vSphere 6.5 storage enhancements in this core storage white paper.
Hi Cormac,
I enjoyed reading post! It has some very interesting details. For instance, I didn’t know that pointer blocks are no longer used. (Now I wonder why the files .pbc.sf and pb2.sf still exist on vmfs-6 …)
By the way, this might not be useful information, but the output of “vmkfstools -D .sbc.sf” also shows the sub block size 🙂
[root@esxi65:/vmfs/volumes] vmkfstools -D vmfs6a/.sbc.sf
Lock [type 10c00001 offset 7503872 v 3, hb offset 3571712
gen 1, mode 0, owner 00000000-00000000-0000-000000000000 mtime 7274
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr , gen 1, links 1, type sys, flags 0x8, uid 0, gid 0, mode 400
len 1074331648, nb 1025 tbz 0, cow 0, newSinceEpoch 1025, zla 5, bs 1048576
16384 resources, each of size 65536 < You can also see that the pointer block system resource file is now really small (a single 64K sub-block?) …
It would indeed fit into a sub block but for some reason the file occupies a file block (zla 1 = file block, block size,bs = 1 MB) – at least on my system. Now, this is certainly useless information, isn’t it? 😀
[root@esxi65:/vmfs/volumes/5888b574-02fd7324-dd77-000c2975f544] vmkfstools -D .pbc.sf
Lock [type 10c00001 offset 7495680 v 1, hb offset 0
gen 0, mode 0, owner 00000000-00000000-0000-000000000000 mtime 7152
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr , gen 1, links 1, type sys, flags 0, uid 0, gid 0, mode 400
len 65536, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 1, bs 1048576 <=============
0 resources, each of size 65536
Organized as 0 CGs, 8 C/CG and 0 R/C
CGsize 65536. 0th CG at 65536
The only explanation I have is that the file was bigger than 64k at some time. A new file with a size of 64k is stored in a sub block (zla 2 = sub block, Block size = 64kB) (as opposed to VMFS3, where only files smaller than 64k are stored in a sub block):
[root@esxi65:/vmfs/volumes/5888b574-02fd7324-dd77-000c2975f544] vmkfstools -D 64k.txt
Lock [type 10c00001 offset 133758976 v 7, hb offset 3571712
gen 169, mode 0, owner 00000000-00000000-0000-000000000000 mtime 246
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr , gen 1, links 1, type reg, flags 0x1, uid 0, gid 0, mode 644
len 65536, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 2, bs 65536 <=============
When a small file is overwritten with data larger than 64k and then again with small data, the file stays on a file block:
# dd if=/dev/zero bs=1k count=10 of=test.txt
# dd if=/dev/zero bs=1k count=70 of=test.txt
# dd if=/dev/zero bs=1k count=10 of=test.txt
[root@esxi65:/vmfs/volumes/5888b574-02fd7324-dd77-000c2975f544] vmkfstools -D test-grow.txt
Lock [type 10c00001 offset 62193664 v 7, hb offset 3571712
gen 169, mode 0, owner 00000000-00000000-0000-000000000000 mtime 296
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr , gen 1, links 1, type reg, flags 0x9, uid 0, gid 0, mode 644
len 10240, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 1, bs 1048576 <=============
Very cool – thanks for sharing Pascal.