Thursday, April 8, 2021

ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 4 - Conclusion

 

I started benchmarking this setup to get a feeling of the performance expectations and pitfalls of ZFS, Raid-Z, L2ARC and SLOG. This was especially interesting with different volblocksize settings

It started as few innocent benchmarks and it grew into what you have seen in the previous parts. But it grew more than I would have liked.

* A basic conclusion is that there is not one setup which is the best solution.

* volblocksize of 8k should probably be avoided for most normal usage, independent of if the application is heavy on using 8k io.

* volblocksize seems to have very little effect if used on a system without L2ARC. The L2ARC likes to have it as small as possible for small io sizes.

* volblocksize has little effect on writes. For this setup 32k was best regardless of if SLOG was used or not.

* L2ARC has some effect, but perhaps a SATA SSD improves performance but seems to be false economy. It makes more sense to invest on more memory.

* SLOG seems to have significant impact. I had trouble finding non-SATA SSD which had PLP support. But even SATA SSD doubled random write performance on this system.

Wednesday, April 7, 2021

ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 3 - Compressible Data Benchmarks

Continuation of: ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 2 - In-compressible Data Benchmarks

As discussed in Part 1, the tests were repeated on compressible data using repeating buffer patterns of fio. This data was found to be compressible to about 1/4 size by ZSTD. The compression improved both read and write performance.

Read & Random Read Results:

Note: This first set of results are with 16GB ARC 
 
Read:
 
The command used for the tests was (The --rw option was read and randread):
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=80G --iodepth=64 --direct=1 --buffered=0 --time_based --runtime=300 --directory=VOLUME_MOUNTPOINT

bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 35.2k 138MiB/s 1044 4179KiB/s
16k 41.3k 161MiB/s 1012 4051KiB/s
32k 43.7k 171MiB/s 1455 5820KiB/s
64k 45.4k 177MiB/s 990 3963KiB/s
128k 48.6k 190MiB/s 1187 4750KiB/s
256k 51.8k 202MiB/s 1297 5189KiB/s
512k 46.3k 181MiB/s 1401 5605KiB/s
1024k 43.3k 169MiB/s 918 3672KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 8120 127MiB/s 744 11.6MiB/s
16k 11.9k 185MiB/s 1430 22.3MiB/s
32k 15.3k 239MiB/s 2301 35.0MiB/s
64k 15.1k 235MiB/s 1614 25.2MiB/s
128k 11.6k 182MiB/s 1488 23.3MiB/s
256k 8744 137MiB/s 1257 19.7MiB/s
512k 10.2k 159MiB/s 1467 22.9MiB/s
1024k 17.1k 267MiB/s 1222 19.1MiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 1859 116MiB/s 202 12.7MiB/s
16k 2800 175MiB/s 418 26.2MiB/s
32k 4413 276MiB/s 968 60.5MiB/s
64k 2769 173MiB/s 914 57.2MiB/s
128k 2705 169MiB/s 1351 84.5MiB/s
256k 2891 181MiB/s 1254 78.4MiB/s
512k 2514 157MiB/s 1221 76.4MiB/s
1024k 4281 268MiB/s 1216 76.0MiB/s

Write & Random Write Results (with and without fsync):

The same command as in read tests was used with --rw option having write and randwrite setting. Also the --fsync setting was used
 
Write:
The write performance is quite similar to read performance but significantly better compared to in-compressible data writes.
 
bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 17.8k 69.4MiB/s 870 3481KiB/s
16k 20.8k 81.4MiB/s 784 3136KiB/s
32k 27.3k 107MiB/s 859 3439KiB/s
64k 23.8k 93.0MiB/s 671 2687KiB/s
128k 25.5k 99.5MiB/s 1142 4572KiB/s
256k 26.6k 104MiB/s 1199 4797KiB/s
512k 36.6k 143MiB/s 1205 4823KiB/s
1024k 38.9k 152MiB/s 1188 4756KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 5221 81.6MiB/s 497 7966KiB/s
16k 10.1k 159MiB/s 481 7701KiB/s
32k 6872 107MiB/s 565 9054KiB/s
64k 6433 101MiB/s 662 10.4MiB/s
128k 6900 108MiB/s 751 11.7MiB/s
256k 12.1k 189MiB/s 696 10.9MiB/s
512k 12.3k 192MiB/s 537 8601KiB/s
1024k 6127 95.7MiB/s 605 9687KiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1448 90.5MiB/s 347 21.7MiB/s
16k 2481 155MiB/s 466 29.1MiB/s
32k 3596 225MiB/s 557 34.8MiB/s
64k 6579 411MiB/s 526 32.9MiB/s
128k 2727 170MiB/s 659 41.2MiB/s
256k 2070 129MiB/s 637 39.8MiB/s
512k 2521 158MiB/s 600 37.5MiB/s
1024k 2117 132MiB/s 782 48.9MiB/s

Random Write:

bs=4k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1807 7232KiB/s 912 3648KiB/s
16k 1666 6664KiB/s 808 3236KiB/s
32k 1491 5966KiB/s 1028 4113KiB/s
64k 795 3183KiB/s 1059 4238KiB/s
128k 693 2776KiB/s 912 3649KiB/s
256k 700 2803KiB/s 827 3311KiB/s
512k 811 3247KiB/s 559 2237KiB/s
1024k 614 2459KiB/s 525 2104KiB/s










bs=16k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 5131 80.2MiB/s 533 8535KiB/s
16k 9395 147MiB/s 430 6894KiB/s
32k 1417 22.1MiB/s 536 8586KiB/s
64k 787 12.3MiB/s 605 9686KiB/s
128k 730 11.4MiB/s 709 11.1MiB/s
256k 689 10.8MiB/s 776 12.1MiB/s
512k 828 12.9MiB/s 513 8214KiB/s
1024k 558 8931KiB/s 464 7437KiB/s










bs=64k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1673 105MiB/s 378 23.7MiB/s
16k 2374 148MiB/s 411 25.7MiB/s
32k 3326 208MiB/s 437 27.3MiB/s
64k 7525 470MiB/s 490 30.7MiB/s
128k 571 35.7MiB/s 634 39.7MiB/s
256k 609 38.1MiB/s 615 38.5MiB/s
512k 726 45.4MiB/s 399 24.0MiB/s
1024k 567 35.5MiB/s 343 21.4MiB/s

Read & Random Read Results With & Without Cache:

The cache was occupied little over 80GB in size so presumably it had all the data cached. The used command for cached reads was:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --loops=8 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 
and for uncached reads there was no reason to loop multiple times:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 
With cache:
 
Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.9k 136MiB/s 36.6k 143MiB/s
16k 37.6k 147MiB/s 37.8k 148MiB/s
32k 41.2k 161MiB/s 10.2k 39.8MiB/s
64k 18.0k 70.3MiB/s 16.3k 63.9MiB/s
128k 37.1k 145MiB/s 11.4k 44.5MiB/s
256k 29.1k 114MiB/s 8000 31.3MiB/s
512k 37.5k 147MiB/s 4790 18.7MiB/s
1024k 45.7k 179MiB/s 2597 10.1MiB/s





Cache: MK0100GCTYU



bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 19.9k 312MiB/s 25.9k 405MiB/s
16k 33.5k 523MiB/s 33.5k 523MiB/s
32k 26.3k 410MiB/s 33.1k 517MiB/s
64k 6226 97.3MiB/s 24.7k 385MiB/s
128k 14.3k 224MiB/s 15.5k 242MiB/s
256k 8718 136MiB/s 8757 137MiB/s
512k 10.6k 166MiB/s 5037 78.7MiB/s
1024k 16.7k 261MiB/s 2679 41.9MiB/s





Cache: MK0100GCTYU



bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 4111 257MiB/s 7099 444MiB/s
16k 8012 501MiB/s 10.1k 628MiB/s
32k 12.9k 805MiB/s 14.5k 909MiB/s
64k 15.1k 947MiB/s 18.9k 1182MiB/s
128k 9527 595MiB/s 13.5k 841MiB/s
256k 3116 195MiB/s 8363 523MiB/s
512k 4257 266MiB/s 5099 319MiB/s
1024k 5018 314MiB/s 2780 174MiB/s
 
Without Cache:
 
No Cache



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 21.9k 85.4MiB/s 671 2685KiB/s
16k 31.3k 122MiB/s 663 2655KiB/s
32k 37.8k 148MiB/s 776 3105KiB/s
64k 17.0k 70.2MiB/s 619 2477KiB/s
128k 36.4k 142MiB/s 536 2144KiB/s
256k 29.5k 115MiB/s 511 2047KiB/s
512k 36.8k 144MiB/s 555 2221KiB/s
1024k 44.3k 173MiB/s 540 2161KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 5209 81.4MiB/s 338 5414KiB/s
16k 6419 100MiB/s 601 9618KiB/s
32k 13.1k 205MiB/s 796 12.4MiB/s
64k 5020 78.4MiB/s 609 9748KiB/s
128k 11.5k 180MiB/s 528 8458KiB/s
256k 7690 120MiB/s 509 8154KiB/s
512k 9765 153MiB/s 555 8884KiB/s
1024k 16.3k 254MiB/s 537 8599KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 2030 127MiB/s 303 18.9MiB/s
16k 2392 150MiB/s 528 33.0MiB/s
32k 3782 236MiB/s 647 40.5MiB/s
64k 1199 74.0MiB/s 477 29.8MiB/s
128k 3034 190MiB/s 549 34.4MiB/s
256k 2148 134MiB/s 554 34.7MiB/s
512k 2711 169MiB/s 610 38.1MiB/s
1024k 4303 269MiB/s 596 37.3MiB/s
 
 

Write & Random Write Results (with and without fsync) + SLOG:

Same command as the previous write tests was used with and without fsync and with the addition of SLOG device.

Write: The sequential write operations became slightly slower however random write operation performance doubled.

SLOG: INTEL SSDSC2KG480G8



bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 12.8k 49.8MiB/s 2037 8150KiB/s
16k 16.6k 64.0MiB/s 2058 8235KiB/s
32k 16.9k 65.0MiB/s 2026 8106KiB/s
64k 12.1k 47.2MiB/s 2083 8335KiB/s
128k 15.7k 61.3MiB/s 2106 8425KiB/s
256k 15.3k 59.6MiB/s 2070 8282KiB/s
512k 13.8k 53.9MiB/s 2090 8360KiB/s
1024k 13.0k 54.6MiB/s 2048 8193KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 6428 100MiB/s 1714 26.8MiB/s
16k 15.1k 236MiB/s 1602 25.0MiB/s
32k 6937 108MiB/s 1679 26.2MiB/s
64k 5425 84.8MiB/s 1725 26.0MiB/s
128k 6905 108MiB/s 1715 26.8MiB/s
256k 5712 89.3MiB/s 1692 26.4MiB/s
512k 6892 108MiB/s 1747 27.3MiB/s
1024k 7012 110MiB/s 1693 26.5MiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1576 98.5MiB/s 1096 68.5MiB/s
16k 3479 217MiB/s 1060 66.3MiB/s
32k 4856 304MiB/s 1068 66.8MiB/s
64k 5336 334MiB/s 1089 68.1MiB/s
128k 1872 117MiB/s 1057 66.1MiB/s
256k 1752 110MiB/s 1089 68.1MiB/s
512k 2279 142MiB/s 1140 71.3MiB/s
1024k 2571 161MiB/s 1140 71.3MiB/s

Random Write
 
fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
1031 4126KiB/s 1543 6175KiB/s
1148 4594KiB/s 1725 6900KiB/s
1059 4239KiB/s 1468 5874KiB/s
1015 4061KiB/s 1398 5595KiB/s
896 3586KiB/s 1236 4947KiB/s
676 2708KiB/s 1085 4342KiB/s
679 2719KiB/s 868 3475KiB/s
501 2008KiB/s 651 2604KiB/s








fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
5057 79.0MiB/s 1699 26.6MiB/s
12.5k 195MiB/s 1631 25.5MiB/s
1798 28.1MiB/s 1807 28.2MiB/s
1440 22.5MiB/s 1575 24.6MiB/s
995 15.6MiB/s 1413 22.1MiB/s
580 9293KiB/s 1066 16.7MiB/s
632 9.88MiB/s 946 14.8MiB/s
560 8976KiB/s 655 10.2MiB/s








fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
1481 92.6MiB/s 1000 62.5MiB/s
2820 176MiB/s 1075 67.2MiB/s
4817 301MiB/s 1068 66.8MiB/s
5240 328MiB/s 1112 69.5MiB/s
941 58.9MiB/s 1109 69.3MiB/s
638 39.9MiB/s 925 57.8MiB/s
764 47.8MiB/s 886 55.4MiB/s
591 36.9MiB/s 607 37.0MiB/s

Read Results With 1GB ARC + With & Without 100GB L2ARC


No Cache








volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 21.9k 85.4MiB/s 671 2685KiB/s
16k 31.3k 122MiB/s 663 2655KiB/s
32k 37.8k 148MiB/s 776 3105KiB/s
64k 17.0k 70.2MiB/s 619 2477KiB/s
128k 36.4k 142MiB/s 536 2144KiB/s
256k 29.5k 115MiB/s 511 2047KiB/s
512k 36.8k 144MiB/s 555 2221KiB/s
1024k 44.3k 173MiB/s 540 2161KiB/s





No Cache








volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 5209 81.4MiB/s 338 5414KiB/s
16k 6419 100MiB/s 601 9618KiB/s
32k 13.1k 205MiB/s 796 12.4MiB/s
64k 5020 78.4MiB/s 609 9748KiB/s
128k 11.5k 180MiB/s 528 8458KiB/s
256k 7690 120MiB/s 509 8154KiB/s
512k 9765 153MiB/s 555 8884KiB/s
1024k 16.3k 254MiB/s 537 8599KiB/s










No Cache



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 2030 127MiB/s 303 18.9MiB/s
16k 2392 150MiB/s 528 33.0MiB/s
32k 3782 236MiB/s 647 40.5MiB/s
64k 1199 74.0MiB/s 477 29.8MiB/s
128k 3034 190MiB/s 549 34.4MiB/s
256k 2148 134MiB/s 554 34.7MiB/s
512k 2711 169MiB/s 610 38.1MiB/s
1024k 4303 269MiB/s 596 37.3MiB/s

A SATA SSD drive was used for L2ARC cache. Cache is helping random read tremendously.

Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.9k 136MiB/s 36.6k 143MiB/s
16k 37.6k 147MiB/s 37.8k 148MiB/s
32k 41.2k 161MiB/s 10.2k 39.8MiB/s
64k 18.0k 70.3MiB/s 16.3k 63.9MiB/s
128k 37.1k 145MiB/s 11.4k 44.5MiB/s
256k 29.1k 114MiB/s 8000 31.3MiB/s
512k 37.5k 147MiB/s 4790 18.7MiB/s
1024k 45.7k 179MiB/s 2597 10.1MiB/s





Cache: MK0100GCTYU



bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 19.9k 312MiB/s 25.9k 405MiB/s
16k 33.5k 523MiB/s 33.5k 523MiB/s
32k 26.3k 410MiB/s 33.1k 517MiB/s
64k 6226 97.3MiB/s 24.7k 385MiB/s
128k 14.3k 224MiB/s 15.5k 242MiB/s
256k 8718 136MiB/s 8757 137MiB/s
512k 10.6k 166MiB/s 5037 78.7MiB/s
1024k 16.7k 261MiB/s 2679 41.9MiB/s





Cache: MK0100GCTYU



bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 4111 257MiB/s 7099 444MiB/s
16k 8012 501MiB/s 10.1k 628MiB/s
32k 12.9k 805MiB/s 14.5k 909MiB/s
64k 15.1k 947MiB/s 18.9k 1182MiB/s
128k 9527 595MiB/s 13.5k 841MiB/s
256k 3116 195MiB/s 8363 523MiB/s
512k 4257 266MiB/s 5099 319MiB/s
1024k 5018 314MiB/s 2780 174MiB/s


ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 2 - In-compressible Data Benchmarks

Continuation of: ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 1 - Benchmark Background Information

 

Read & Random Read Results:

Note: This first set of results are with 16GB ARC 
 
The command used for the tests was:
fio --name=test --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=80G --iodepth=64 --direct=1 --buffered=0 --time_based --runtime=300 --directory=VOLUME_MOUNTPOINT

bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 18.9k 73.7MiB/s 906 3625KiB/s
16k 34.6k 135MiB/s 735 2942KiB/s
32k 43.0k 168MiB/s 408 1632KiB/s
64k 45.0k 180MiB/s 380 1524KiB/s
128k 46.9k 183MiB/s 360 1442KiB/s
256k 47.9k 187MiB/s 320 1283KiB/s
512k 43.7k 171MiB/s 296 1187KiB/s
1024k 42.7k 167MiB/s 274 1097KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 2679.0 41.9MiB/s 445 7127KiB/s
16k 3101.0 48.5MiB/s 815 12.7MiB/s
32k 3767.0 58.9MiB/s 796 12.5MiB/s
64k 8429.0 132MiB/s 683 10.7MiB/s
128k 8729.0 136MiB/s 590 9455KiB/s
256k 11.0k 173MiB/s 427 6846KiB/s
512k 8746.0 137MiB/s 364 5831KiB/s
1024k 9861.0 154MiB/s 293 4698KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 901.0 56.3MiB/s 236 14.8MiB/s
16k 965.0 60.4MiB/s 234 14.6MiB/s
32k 973.0 60.8MiB/s 224 14.0MiB/s
64k 1229.0 76.8MiB/s 320 20.0MiB/s
128k 1864.0 117MiB/s 505 31.6MiB/s
256k 3072.0 192MiB/s 436 27.3MiB/s
512k 3013.0 188MiB/s 368 23.0MiB/s
1024k 3172.0 198MiB/s 281 17.6MiB/s

As can be seen clearly, random read performance peaks when bs value is same as volblocksize. Although the performance peaked at 128k volblocksize for the 64k bs value.

Clearly the sequential read is performing better with larger volblocksize settings.
 
So, depending on your workload read size and read requirements you may want to select a volblocksize as close as the demand requirements. If your only concern is raw read speed, then it is best to use the larger volblocksize.

Write & Random Write Results (with and without fsync):

The same command as in read tests was used with --rw option having write and randwrite setting. Also the --fsync setting was used

Write:

bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 10.5k 41.2MiB/s 65 260KiB/s
16k 14.3k 55.0MiB/s 34 136KiB/s
32k 14.1k 55.3MiB/s 39 156KiB/s
64k 12.0k 47.0MiB/s 44 177KiB/s
128k 10.6k 41.6MiB/s 45 180KiB/s
256k 13.2k 51.4MiB/s 45 182KiB/s
512k 3901 15.2MiB/s 64 258KiB/s
1024k 5633 22.0MiB/s 121 486KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 3494 54.6MiB/s 51 828KiB/s
16k 5216 81.5MiB/s 35 561KiB/s
32k 2020 31.6MiB/s 55 889KiB/s
64k 1563 24.4MiB/s 43 689KiB/s
128k 1690 26.4MiB/s 42 682KiB/s
256k 1568 24.5MiB/s 43 689KiB/s
512k 1058 16.5MiB/s 40 642KiB/s
1024k 1441 22.5MiB/s 42 675KiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1020 63.8MiB/s 59 3823KiB/s
16k 1441 90.1MiB/s 36 2344KiB/s
32k 1834 115MiB/s 37 2394KiB/s
64k 4019 251MiB/s 43 2784KiB/s
128k 1006 62.9MiB/s 49 3188KiB/s
256k 996 62.3MiB/s 49 3160KiB/s
512k 902 56.4MiB/s 46 2997KiB/s
1024k 854 53.4MiB/s 49 3199KiB/s

Random Write:

bs=4k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1593 6372KiB/s 47 188KiB/s
16k 1329 5318KiB/s 40 162KiB/s
32k 753 3014KiB/s 50 200KiB/s
64k 650 2600KiB/s 43 175KiB/s
128k 582 2332KiB/s 58 235KiB/s
256k 261 1048KiB/s 59 237KiB/s
512k 233 934KiB/s 70 281KiB/s
1024k 181 726KiB/s 108 435KiB/s










bs=16k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 4540 70.9MiB/s 91 1458KiB/s
16k 6510 102MiB/s 36 581KiB/s
32k 1013 15.8MiB/s 37 600KiB/s
64k 865 13.5MiB/s 59 946KiB/s
128k 416 6667KiB/s 47 757KiB/s
256k 353 5661KiB/s 63 1020KiB/s
512k 291 4671KiB/s 80 1296KiB/s
1024k 204 3273KiB/s 80 1288KiB/s










bs=64k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1330 83.2MiB/s 55 3531KiB/s
16k 1975 123MiB/s 41 2672KiB/s
32k 2065 129MiB/s 39 2524KiB/s
64k 4228 264MiB/s 40 2594KiB/s
128k 513 32.1MiB/s 48 3073KiB/s
256k 358 22.4MiB/s 54 3504KiB/s
512k 286 17.9MiB/s 67 4343KiB/s
1024k 209 13.1MiB/s 65 4213KiB/s

Read & Random Read Results With & Without Cache:

The cache was occupied little over 80GB in size so presumably it had all the data cached. The used command for cached reads was:
fio --name=test --loops=8 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 
and for uncached reads there was no reason to loop multiple times:
fio --name=test --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 

The results are with cache:

Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.1k 133MiB/s 34.2k 134MiB/s
16k 40.0k 160MiB/s 5509 21.5MiB/s
32k 45.5k 178MiB/s 5215 20.4MiB/s
64k 49.9k 195MiB/s 5423 21.2MiB/s
128k 55.9k 218MiB/s 3624 14.2MiB/s
256k 53.7k 210MiB/s 2020 8081KiB/s
512k 52.5k 205MiB/s 1043 4174KiB/s
1024k 47.7k 186MiB/s 529 2118KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 16.1k 251MiB/s 19.3k 301MiB/s
16k 21.9k 342MiB/s 22.4k 350MiB/s
32k 23.2k 362MiB/s 13.4k 209MiB/s
64k 26.3k 410MiB/s 7077 111MiB/s
128k 30.8k 481MiB/s 3950 61.7MiB/s
256k 26.4k 413MiB/s 2083 32.6MiB/s
512k 26.5k 414MiB/s 1070 16.7MiB/s
1024k 26.0k 421MiB/s 539 8633KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 3606 225MiB/s 4704 294MiB/s
16k 5046 315MiB/s 5482 343MiB/s
32k 5666 354MiB/s 5833 365MiB/s
64k 6064 379MiB/s 6092 381MiB/s
128k 7101 444MiB/s 3705 232MiB/s
256k 8264 517MiB/s 2079 130MiB/s
512k 8752 547MiB/s 1082 67.6MiB/s
1024k 7878 492MiB/s 550 34.4MiB/s

Without Cache:

No Cache



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.5k 135MiB/s 791 3167KiB/s
16k 42.3k 165MiB/s 796 3187KiB/s
32k 46.1k 180MiB/s 476 1906KiB/s
64k 50.6k 198MiB/s 445 1781KiB/s
128k 61.7k 241MiB/s 412 1650KiB/s
256k 54.7k 214MiB/s 501 2006KiB/s
512k 54.7k 214MiB/s 375 1501KiB/s
1024k 51.2k 200MiB/s 293 1173KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 12.7k 199MiB/s 679 10.6MiB/s
16k 15.0k 250MiB/s 717 11.2MiB/s
32k 19.2k 300MiB/s 440 7042KiB/s
64k 24.9k 389MiB/s 416 6657KiB/s
128k 30.0k 484MiB/s 392 6281KiB/s
256k 26.6k 416MiB/s 485 7763KiB/s
512k 27.3k 427MiB/s 365 5842KiB/s
1024k 26.4k 412MiB/s 293 4703KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 3080 193MiB/s 499 31.2MiB/s
16k 4395 275MiB/s 572 35.8MiB/s
32k 4847 303MiB/s 397 24.8MiB/s
64k 6568 411MiB/s 394 24.6MiB/s
128k 8692 543MiB/s 386 24.2MiB/s
256k 8110 507MiB/s 495 30.0MiB/s
512k 8773 548MiB/s 367 22.9MiB/s
1024k 7680 480MiB/s 294 18.4MiB/s


Write & Random Write Results (with and without fsync) + SLOG:

Same command as the previous write tests was used with and without fsync and with the addition of SLOG device.

Write:

SLOG: INTEL SSDSC2KG480G8



bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 9343 36.5MiB/s 686 2747KiB/s
16k 14.5k 56.7MiB/s 668 2674KiB/s
32k 9775 38.2MiB/s 873 3493KiB/s
64k 8245 32.2MiB/s 1027 4108KiB/s
128k 11.8k 46.1MiB/s 1087 4349KiB/s
256k 12.3k 48.1MiB/s 1175 4703KiB/s
512k 13.8k 53.8MiB/s 1085 4344KiB/s
1024k 16.6k 64.0MiB/s 1230 4923KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 6007 93.9MiB/s 452 7236KiB/s
16k 9875 154MiB/s 681 10.6MiB/s
32k 4804 75.1MiB/s 564 9030KiB/s
64k 3691 57.7MiB/s 730 11.4MiB/s
128k 5233 81.8MiB/s 755 11.8MiB/s
256k 5479 85.6MiB/s 653 10.2MiB/s
512k 4888 76.4MiB/s 669 10.5MiB/s
1024k 5894 92.1MiB/s 542 8677KiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1765 110MiB/s 456 28.6MiB/s
16k 2530 158MiB/s 499 31.2MiB/s
32k 2815 176MiB/s 594 37.2MiB/s
64k 5298 331MiB/s 214 13.4MiB/s
128k 1555 97.2MiB/s 538 33.6MiB/s
256k 1819 114MiB/s 634 39.7MiB/s
512k 1751 109MiB/s 666 41.6MiB/s
1024k 1915 120MiB/s 601 37.6MiB/s

 

Random Write:

SLOG: INTEL SSDSC2KG480G8



bs=4k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 524 2098KiB/s 719 2879KiB/s
16k 785 3140KiB/s 721 2885KiB/s
32k 502 2011KiB/s 713 2853KiB/s
64k 662 2651KiB/s 826 3305KiB/s
128k 546 2185KiB/s 593 2372KiB/s
256k 417 1671KiB/s 469 1877KiB/s
512k 330 1321KiB/s 343 1373KiB/s
1024k 271 1085KiB/s 265 1063KiB/s










bs=16k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 5198 81.2MiB/s 454 7272KiB/s
16k 9380 147MiB/s 440 7042KiB/s
32k 958 14.0MiB/s 501 8020KiB/s
64k 995 15.6MiB/s 643 10.1MiB/s
128k 733 11.5MiB/s 537 8600KiB/s
256k 553 8858KiB/s 505 8081KiB/s
512k 368 5900KiB/s 343 5504KiB/s
1024k 260 4168KiB/s 248 3972KiB/s










bs=64k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1531 95.7MiB/s 378 23.6MiB/s
16k 2545 159MiB/s 439 27.5MiB/s
32k 2577 161MiB/s 453 28.4MiB/s
64k 4998 312MiB/s 205 12.8MiB/s
128k 743 46.5MiB/s 444 27.8MiB/s
256k 527 32.0MiB/s 377 23.6MiB/s
512k 387 24.2MiB/s 331 20.7MiB/s
1024k 270 16.9MiB/s 238 14.9MiB/s


Read Results With 1GB ARC + With & Without 100GB L2ARC

 
No Cache



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.5k 135MiB/s 791 3167KiB/s
16k 42.3k 165MiB/s 796 3187KiB/s
32k 46.1k 180MiB/s 476 1906KiB/s
64k 50.6k 198MiB/s 445 1781KiB/s
128k 61.7k 241MiB/s 412 =1650KiB/s
256k 54.7k 214MiB/s 501 2006KiB/s
512k 54.7k 214MiB/s 375 1501KiB/s
1024k 51.2k 200MiB/s 293 1173KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 12.7k 199MiB/s 679 10.6MiB/s
16k 15.0k 250MiB/s 717 11.2MiB/s
32k 19.2k 300MiB/s 440 7042KiB/s
64k 24.9k 389MiB/s 416 6657KiB/s
128k 30.0k 484MiB/s 392 6281KiB/s
256k 26.6k 416MiB/s 485 7763KiB/s
512k 27.3k 427MiB/s 365 5842KiB/s
1024k 26.4k 412MiB/s 293 4703KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 3080 193MiB/s 499 31.2MiB/s
16k 4395 275MiB/s 572 35.8MiB/s
32k 4847 303MiB/s 397 24.8MiB/s
64k 6568 411MiB/s 394 24.6MiB/s
128k 8692 543MiB/s 386 24.2MiB/s
256k 8110 507MiB/s 495 30.0MiB/s
512k 8773 548MiB/s 367 22.9MiB/s
1024k 7680 480MiB/s 294 18.4MiB/s
 
 
 
Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.1k 133MiB/s 34.2k 134MiB/s
16k 40.0k 160MiB/s 5509 21.5MiB/s
32k 45.5k 178MiB/s 5215 20.4MiB/s
64k 49.9k 195MiB/s 5423 21.2MiB/s
128k 55.9k 218MiB/s 3624 14.2MiB/s
256k 53.7k 210MiB/s 2020 8081KiB/s
512k 52.5k 205MiB/s 1043 4174KiB/s
1024k 47.7k 186MiB/s 529 2118KiB/s





Cache: MK0100GCTYU



bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 16.1k 251MiB/s 19.3k 301MiB/s
16k 21.9k 342MiB/s 22.4k 350MiB/s
32k 23.2k 362MiB/s 13.4k 209MiB/s
64k 26.3k 410MiB/s 7077 111MiB/s
128k 30.8k 481MiB/s 3950 61.7MiB/s
256k 26.4k 413MiB/s 2083 32.6MiB/s
512k 26.5k 414MiB/s 1070 16.7MiB/s
1024k 26.0k 421MiB/s 539 8633KiB/s





Cache: MK0100GCTYU



bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 3606 225MiB/s 4704 294MiB/s
16k 5046 315MiB/s 5482 343MiB/s
32k 5666 354MiB/s 5833 365MiB/s
64k 6064 379MiB/s 6092 381MiB/s
128k 7101 444MiB/s 3705 232MiB/s
256k 8264 517MiB/s 2079 130MiB/s
512k 8752 547MiB/s 1082 67.6MiB/s
1024k 7878 492MiB/s 550 34.4MiB/s