Performance Boost: All-Flash VSAN on vSphere 6.0u3

By | June 5, 2017

In a previous post I tested a four-node all-flash VSAN running VSAN 6.2 (vSphere 6.0u2) with two diskgroups using HCIBench. The performance was a little subpar to the author of the tool so he helped me with some tweaks that got better performance out of the cluster (post here). Now that vSphere 6.0u3 and 6.5d (VSAN 6.6) are out, they tote performance improvements for VSAN ( KB ), and since I am now at vCenter 6.0u3b, I wanted to take that same system that I used previously and re-run the same tests.

The setup is exactly the same, the latest HCIBench on a four-node all-flash cluster with two diskgroups each. This results in eight VMs running so that it touches all of the cache disks/diskgroups.

 

I also used these settings per Chen Wei:

esxcfg-advcfg -s 131072 /LSOM/blPLOGCacheLines

esxcfg-advcfg -s 32768 /LSOM/blPLOGLsnCacheLines

esxcfg-advcfg -s 32768 /LSOM/blLLOGCacheLines

These settings are to extend the SSH timeout so that SSH isn’t turned off prior to HCIBench connecting into reset the cache.

esxcli system settings advanced set -o /UserVars/ESXiShellTimeOut -i 21600

esxcli system settings advanced set -o /UserVars/ESXiShellInteractiveTimeOut -i 21600

 

The Results:

Compression/Dedupe No Compression/No Dedupe
6.0u2 6.0u3 6.0u2 6.0u3
RAID r5 r1 r5 r1 r5 r1 r5 r1
IOPS IO/s 67909.1 125662.9 100429.8 135949.2 59836.49 160063.7 166624.3 197006.3
THROUGHPUT MB/s 265.27 490.86 392.3 531.04 233.75 625.24 650.88 769.56
LATENCY ms 7.5479 4.1603 5.1194 3.7692 8.5948 3.3625 3.0702 2.6081
R_LATENCY ms 4.4493 4.1294 4.0864 4.291 1.9708 2.4579 2.1941 2.6491
W_LATENCY ms 14.7758 4.2324 7.5311 2.55 24.0651 5.4739 2.5501 2.5129

I am SUPER impressed by the performance improvements as shown by HCIBench. The top IOPS is up to 197K for R1, no-compress (up from 135949). The lowest IOPS is up to 100429 IOPS for R5 with compression (up from 67909 for the same use case in 6.0u2). Interestingly, the weird degradation for R5 without compression is NOT present in 6.0u3. As expected, lowest to highest IOPS are: compression r5, compression r1, no-compression r5, no-compression r1. I think you can use these datapoints now to design a solution more accurately.

For a general purpose cluster, I would leave compression/deduplication on and stay with R5. The space savings from both R5 and compression make flash competitive in terms of cost.

If I had a few clients that needed a little more speed, I would consider creating a different storage profile with R1 but using the same vsan datastore that has compression and deduplication on.

If I had guests that I knew wouldn’t compress well (video for example), then R5 w/o compression would be a good choice since I would have a guaranteed savings from the erasure encoding.

Top of the line Ferrari would be a R1 cluster w/o compression (or a mix of R5 and R1 in that cluster depending on how many guests).

Now these are just my pie-in-the-sky general thoughts based on HCIBench easy mode (70/30) which isn’t necessarily representative, but I think the general ratios may be correct. I am very excited though that VMware is continuing to tune the solution to get more performance out of the solution.

Leave a Reply

Your email address will not be published. Required fields are marked *