Wednesday, April 7, 2021

ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 3 - Compressible Data Benchmarks

Continuation of: ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 2 - In-compressible Data Benchmarks

As discussed in Part 1, the tests were repeated on compressible data using repeating buffer patterns of fio. This data was found to be compressible to about 1/4 size by ZSTD. The compression improved both read and write performance.

Read & Random Read Results:

Note: This first set of results are with 16GB ARC 
 
Read:
 
The command used for the tests was (The --rw option was read and randread):
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=80G --iodepth=64 --direct=1 --buffered=0 --time_based --runtime=300 --directory=VOLUME_MOUNTPOINT

bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 35.2k 138MiB/s 1044 4179KiB/s
16k 41.3k 161MiB/s 1012 4051KiB/s
32k 43.7k 171MiB/s 1455 5820KiB/s
64k 45.4k 177MiB/s 990 3963KiB/s
128k 48.6k 190MiB/s 1187 4750KiB/s
256k 51.8k 202MiB/s 1297 5189KiB/s
512k 46.3k 181MiB/s 1401 5605KiB/s
1024k 43.3k 169MiB/s 918 3672KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 8120 127MiB/s 744 11.6MiB/s
16k 11.9k 185MiB/s 1430 22.3MiB/s
32k 15.3k 239MiB/s 2301 35.0MiB/s
64k 15.1k 235MiB/s 1614 25.2MiB/s
128k 11.6k 182MiB/s 1488 23.3MiB/s
256k 8744 137MiB/s 1257 19.7MiB/s
512k 10.2k 159MiB/s 1467 22.9MiB/s
1024k 17.1k 267MiB/s 1222 19.1MiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 1859 116MiB/s 202 12.7MiB/s
16k 2800 175MiB/s 418 26.2MiB/s
32k 4413 276MiB/s 968 60.5MiB/s
64k 2769 173MiB/s 914 57.2MiB/s
128k 2705 169MiB/s 1351 84.5MiB/s
256k 2891 181MiB/s 1254 78.4MiB/s
512k 2514 157MiB/s 1221 76.4MiB/s
1024k 4281 268MiB/s 1216 76.0MiB/s

Write & Random Write Results (with and without fsync):

The same command as in read tests was used with --rw option having write and randwrite setting. Also the --fsync setting was used
 
Write:
The write performance is quite similar to read performance but significantly better compared to in-compressible data writes.
 
bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 17.8k 69.4MiB/s 870 3481KiB/s
16k 20.8k 81.4MiB/s 784 3136KiB/s
32k 27.3k 107MiB/s 859 3439KiB/s
64k 23.8k 93.0MiB/s 671 2687KiB/s
128k 25.5k 99.5MiB/s 1142 4572KiB/s
256k 26.6k 104MiB/s 1199 4797KiB/s
512k 36.6k 143MiB/s 1205 4823KiB/s
1024k 38.9k 152MiB/s 1188 4756KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 5221 81.6MiB/s 497 7966KiB/s
16k 10.1k 159MiB/s 481 7701KiB/s
32k 6872 107MiB/s 565 9054KiB/s
64k 6433 101MiB/s 662 10.4MiB/s
128k 6900 108MiB/s 751 11.7MiB/s
256k 12.1k 189MiB/s 696 10.9MiB/s
512k 12.3k 192MiB/s 537 8601KiB/s
1024k 6127 95.7MiB/s 605 9687KiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1448 90.5MiB/s 347 21.7MiB/s
16k 2481 155MiB/s 466 29.1MiB/s
32k 3596 225MiB/s 557 34.8MiB/s
64k 6579 411MiB/s 526 32.9MiB/s
128k 2727 170MiB/s 659 41.2MiB/s
256k 2070 129MiB/s 637 39.8MiB/s
512k 2521 158MiB/s 600 37.5MiB/s
1024k 2117 132MiB/s 782 48.9MiB/s

Random Write:

bs=4k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1807 7232KiB/s 912 3648KiB/s
16k 1666 6664KiB/s 808 3236KiB/s
32k 1491 5966KiB/s 1028 4113KiB/s
64k 795 3183KiB/s 1059 4238KiB/s
128k 693 2776KiB/s 912 3649KiB/s
256k 700 2803KiB/s 827 3311KiB/s
512k 811 3247KiB/s 559 2237KiB/s
1024k 614 2459KiB/s 525 2104KiB/s










bs=16k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 5131 80.2MiB/s 533 8535KiB/s
16k 9395 147MiB/s 430 6894KiB/s
32k 1417 22.1MiB/s 536 8586KiB/s
64k 787 12.3MiB/s 605 9686KiB/s
128k 730 11.4MiB/s 709 11.1MiB/s
256k 689 10.8MiB/s 776 12.1MiB/s
512k 828 12.9MiB/s 513 8214KiB/s
1024k 558 8931KiB/s 464 7437KiB/s










bs=64k fsync=0
fsync=1
volblocksize RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
8k 1673 105MiB/s 378 23.7MiB/s
16k 2374 148MiB/s 411 25.7MiB/s
32k 3326 208MiB/s 437 27.3MiB/s
64k 7525 470MiB/s 490 30.7MiB/s
128k 571 35.7MiB/s 634 39.7MiB/s
256k 609 38.1MiB/s 615 38.5MiB/s
512k 726 45.4MiB/s 399 24.0MiB/s
1024k 567 35.5MiB/s 343 21.4MiB/s

Read & Random Read Results With & Without Cache:

The cache was occupied little over 80GB in size so presumably it had all the data cached. The used command for cached reads was:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --loops=8 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 
and for uncached reads there was no reason to loop multiple times:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
 
With cache:
 
Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.9k 136MiB/s 36.6k 143MiB/s
16k 37.6k 147MiB/s 37.8k 148MiB/s
32k 41.2k 161MiB/s 10.2k 39.8MiB/s
64k 18.0k 70.3MiB/s 16.3k 63.9MiB/s
128k 37.1k 145MiB/s 11.4k 44.5MiB/s
256k 29.1k 114MiB/s 8000 31.3MiB/s
512k 37.5k 147MiB/s 4790 18.7MiB/s
1024k 45.7k 179MiB/s 2597 10.1MiB/s





Cache: MK0100GCTYU



bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 19.9k 312MiB/s 25.9k 405MiB/s
16k 33.5k 523MiB/s 33.5k 523MiB/s
32k 26.3k 410MiB/s 33.1k 517MiB/s
64k 6226 97.3MiB/s 24.7k 385MiB/s
128k 14.3k 224MiB/s 15.5k 242MiB/s
256k 8718 136MiB/s 8757 137MiB/s
512k 10.6k 166MiB/s 5037 78.7MiB/s
1024k 16.7k 261MiB/s 2679 41.9MiB/s





Cache: MK0100GCTYU



bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 4111 257MiB/s 7099 444MiB/s
16k 8012 501MiB/s 10.1k 628MiB/s
32k 12.9k 805MiB/s 14.5k 909MiB/s
64k 15.1k 947MiB/s 18.9k 1182MiB/s
128k 9527 595MiB/s 13.5k 841MiB/s
256k 3116 195MiB/s 8363 523MiB/s
512k 4257 266MiB/s 5099 319MiB/s
1024k 5018 314MiB/s 2780 174MiB/s
 
Without Cache:
 
No Cache



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 21.9k 85.4MiB/s 671 2685KiB/s
16k 31.3k 122MiB/s 663 2655KiB/s
32k 37.8k 148MiB/s 776 3105KiB/s
64k 17.0k 70.2MiB/s 619 2477KiB/s
128k 36.4k 142MiB/s 536 2144KiB/s
256k 29.5k 115MiB/s 511 2047KiB/s
512k 36.8k 144MiB/s 555 2221KiB/s
1024k 44.3k 173MiB/s 540 2161KiB/s










bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 5209 81.4MiB/s 338 5414KiB/s
16k 6419 100MiB/s 601 9618KiB/s
32k 13.1k 205MiB/s 796 12.4MiB/s
64k 5020 78.4MiB/s 609 9748KiB/s
128k 11.5k 180MiB/s 528 8458KiB/s
256k 7690 120MiB/s 509 8154KiB/s
512k 9765 153MiB/s 555 8884KiB/s
1024k 16.3k 254MiB/s 537 8599KiB/s










bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 2030 127MiB/s 303 18.9MiB/s
16k 2392 150MiB/s 528 33.0MiB/s
32k 3782 236MiB/s 647 40.5MiB/s
64k 1199 74.0MiB/s 477 29.8MiB/s
128k 3034 190MiB/s 549 34.4MiB/s
256k 2148 134MiB/s 554 34.7MiB/s
512k 2711 169MiB/s 610 38.1MiB/s
1024k 4303 269MiB/s 596 37.3MiB/s
 
 

Write & Random Write Results (with and without fsync) + SLOG:

Same command as the previous write tests was used with and without fsync and with the addition of SLOG device.

Write: The sequential write operations became slightly slower however random write operation performance doubled.

SLOG: INTEL SSDSC2KG480G8



bs=4k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 12.8k 49.8MiB/s 2037 8150KiB/s
16k 16.6k 64.0MiB/s 2058 8235KiB/s
32k 16.9k 65.0MiB/s 2026 8106KiB/s
64k 12.1k 47.2MiB/s 2083 8335KiB/s
128k 15.7k 61.3MiB/s 2106 8425KiB/s
256k 15.3k 59.6MiB/s 2070 8282KiB/s
512k 13.8k 53.9MiB/s 2090 8360KiB/s
1024k 13.0k 54.6MiB/s 2048 8193KiB/s










bs=16k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 6428 100MiB/s 1714 26.8MiB/s
16k 15.1k 236MiB/s 1602 25.0MiB/s
32k 6937 108MiB/s 1679 26.2MiB/s
64k 5425 84.8MiB/s 1725 26.0MiB/s
128k 6905 108MiB/s 1715 26.8MiB/s
256k 5712 89.3MiB/s 1692 26.4MiB/s
512k 6892 108MiB/s 1747 27.3MiB/s
1024k 7012 110MiB/s 1693 26.5MiB/s










bs=64k fsync=0
fsync=1
volblocksize Write IOPS Write BW Write IOPS Write BW
8k 1576 98.5MiB/s 1096 68.5MiB/s
16k 3479 217MiB/s 1060 66.3MiB/s
32k 4856 304MiB/s 1068 66.8MiB/s
64k 5336 334MiB/s 1089 68.1MiB/s
128k 1872 117MiB/s 1057 66.1MiB/s
256k 1752 110MiB/s 1089 68.1MiB/s
512k 2279 142MiB/s 1140 71.3MiB/s
1024k 2571 161MiB/s 1140 71.3MiB/s

Random Write
 
fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
1031 4126KiB/s 1543 6175KiB/s
1148 4594KiB/s 1725 6900KiB/s
1059 4239KiB/s 1468 5874KiB/s
1015 4061KiB/s 1398 5595KiB/s
896 3586KiB/s 1236 4947KiB/s
676 2708KiB/s 1085 4342KiB/s
679 2719KiB/s 868 3475KiB/s
501 2008KiB/s 651 2604KiB/s








fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
5057 79.0MiB/s 1699 26.6MiB/s
12.5k 195MiB/s 1631 25.5MiB/s
1798 28.1MiB/s 1807 28.2MiB/s
1440 22.5MiB/s 1575 24.6MiB/s
995 15.6MiB/s 1413 22.1MiB/s
580 9293KiB/s 1066 16.7MiB/s
632 9.88MiB/s 946 14.8MiB/s
560 8976KiB/s 655 10.2MiB/s








fsync=0
fsync=1
RandWrite IOPS RandWrite BW RandWrite IOPS RandWrite BW
1481 92.6MiB/s 1000 62.5MiB/s
2820 176MiB/s 1075 67.2MiB/s
4817 301MiB/s 1068 66.8MiB/s
5240 328MiB/s 1112 69.5MiB/s
941 58.9MiB/s 1109 69.3MiB/s
638 39.9MiB/s 925 57.8MiB/s
764 47.8MiB/s 886 55.4MiB/s
591 36.9MiB/s 607 37.0MiB/s

Read Results With 1GB ARC + With & Without 100GB L2ARC


No Cache








volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 21.9k 85.4MiB/s 671 2685KiB/s
16k 31.3k 122MiB/s 663 2655KiB/s
32k 37.8k 148MiB/s 776 3105KiB/s
64k 17.0k 70.2MiB/s 619 2477KiB/s
128k 36.4k 142MiB/s 536 2144KiB/s
256k 29.5k 115MiB/s 511 2047KiB/s
512k 36.8k 144MiB/s 555 2221KiB/s
1024k 44.3k 173MiB/s 540 2161KiB/s





No Cache








volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 5209 81.4MiB/s 338 5414KiB/s
16k 6419 100MiB/s 601 9618KiB/s
32k 13.1k 205MiB/s 796 12.4MiB/s
64k 5020 78.4MiB/s 609 9748KiB/s
128k 11.5k 180MiB/s 528 8458KiB/s
256k 7690 120MiB/s 509 8154KiB/s
512k 9765 153MiB/s 555 8884KiB/s
1024k 16.3k 254MiB/s 537 8599KiB/s










No Cache



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 2030 127MiB/s 303 18.9MiB/s
16k 2392 150MiB/s 528 33.0MiB/s
32k 3782 236MiB/s 647 40.5MiB/s
64k 1199 74.0MiB/s 477 29.8MiB/s
128k 3034 190MiB/s 549 34.4MiB/s
256k 2148 134MiB/s 554 34.7MiB/s
512k 2711 169MiB/s 610 38.1MiB/s
1024k 4303 269MiB/s 596 37.3MiB/s

A SATA SSD drive was used for L2ARC cache. Cache is helping random read tremendously.

Cache: MK0100GCTYU



bs=4k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 34.9k 136MiB/s 36.6k 143MiB/s
16k 37.6k 147MiB/s 37.8k 148MiB/s
32k 41.2k 161MiB/s 10.2k 39.8MiB/s
64k 18.0k 70.3MiB/s 16.3k 63.9MiB/s
128k 37.1k 145MiB/s 11.4k 44.5MiB/s
256k 29.1k 114MiB/s 8000 31.3MiB/s
512k 37.5k 147MiB/s 4790 18.7MiB/s
1024k 45.7k 179MiB/s 2597 10.1MiB/s





Cache: MK0100GCTYU



bs=16k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 19.9k 312MiB/s 25.9k 405MiB/s
16k 33.5k 523MiB/s 33.5k 523MiB/s
32k 26.3k 410MiB/s 33.1k 517MiB/s
64k 6226 97.3MiB/s 24.7k 385MiB/s
128k 14.3k 224MiB/s 15.5k 242MiB/s
256k 8718 136MiB/s 8757 137MiB/s
512k 10.6k 166MiB/s 5037 78.7MiB/s
1024k 16.7k 261MiB/s 2679 41.9MiB/s





Cache: MK0100GCTYU



bs=64k



volblocksize Read IOPS Read BW RandRead IOPS RandRead BW
8k 4111 257MiB/s 7099 444MiB/s
16k 8012 501MiB/s 10.1k 628MiB/s
32k 12.9k 805MiB/s 14.5k 909MiB/s
64k 15.1k 947MiB/s 18.9k 1182MiB/s
128k 9527 595MiB/s 13.5k 841MiB/s
256k 3116 195MiB/s 8363 523MiB/s
512k 4257 266MiB/s 5099 319MiB/s
1024k 5018 314MiB/s 2780 174MiB/s


No comments:

Post a Comment