Continuation of: ZFS Raid-Z3 Performance with Zstandard (ZSTD) - Part 2 - In-compressible Data Benchmarks
As discussed in Part 1, the tests were repeated on compressible data using repeating buffer patterns of fio. This data was found to be compressible to about 1/4 size by ZSTD. The compression improved both read and write performance.
Read & Random Read Results:
Note: This first set of results are with 16GB ARC
Read:
The command used for the tests was (The --rw option was read and randread):
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=80G --iodepth=64 --direct=1 --buffered=0 --time_based --runtime=300 --directory=VOLUME_MOUNTPOINT
bs=4k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 35.2k | 138MiB/s | 1044 | 4179KiB/s |
16k | 41.3k | 161MiB/s | 1012 | 4051KiB/s |
32k | 43.7k | 171MiB/s | 1455 | 5820KiB/s |
64k | 45.4k | 177MiB/s | 990 | 3963KiB/s |
128k | 48.6k | 190MiB/s | 1187 | 4750KiB/s |
256k | 51.8k | 202MiB/s | 1297 | 5189KiB/s |
512k | 46.3k | 181MiB/s | 1401 | 5605KiB/s |
1024k | 43.3k | 169MiB/s | 918 | 3672KiB/s |
bs=16k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 8120 | 127MiB/s | 744 | 11.6MiB/s |
16k | 11.9k | 185MiB/s | 1430 | 22.3MiB/s |
32k | 15.3k | 239MiB/s | 2301 | 35.0MiB/s |
64k | 15.1k | 235MiB/s | 1614 | 25.2MiB/s |
128k | 11.6k | 182MiB/s | 1488 | 23.3MiB/s |
256k | 8744 | 137MiB/s | 1257 | 19.7MiB/s |
512k | 10.2k | 159MiB/s | 1467 | 22.9MiB/s |
1024k | 17.1k | 267MiB/s | 1222 | 19.1MiB/s |
bs=64k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 1859 | 116MiB/s | 202 | 12.7MiB/s |
16k | 2800 | 175MiB/s | 418 | 26.2MiB/s |
32k | 4413 | 276MiB/s | 968 | 60.5MiB/s |
64k | 2769 | 173MiB/s | 914 | 57.2MiB/s |
128k | 2705 | 169MiB/s | 1351 | 84.5MiB/s |
256k | 2891 | 181MiB/s | 1254 | 78.4MiB/s |
512k | 2514 | 157MiB/s | 1221 | 76.4MiB/s |
1024k | 4281 | 268MiB/s | 1216 | 76.0MiB/s |
Write & Random Write Results (with and without fsync):
The same command as in read tests was used with --rw option having write and randwrite setting. Also the --fsync setting was used
Write:
The write performance is quite similar to read performance but significantly better compared to in-compressible data writes.
bs=4k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 17.8k | 69.4MiB/s | 870 | 3481KiB/s |
16k | 20.8k | 81.4MiB/s | 784 | 3136KiB/s |
32k | 27.3k | 107MiB/s | 859 | 3439KiB/s |
64k | 23.8k | 93.0MiB/s | 671 | 2687KiB/s |
128k | 25.5k | 99.5MiB/s | 1142 | 4572KiB/s |
256k | 26.6k | 104MiB/s | 1199 | 4797KiB/s |
512k | 36.6k | 143MiB/s | 1205 | 4823KiB/s |
1024k | 38.9k | 152MiB/s | 1188 | 4756KiB/s |
bs=16k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 5221 | 81.6MiB/s | 497 | 7966KiB/s |
16k | 10.1k | 159MiB/s | 481 | 7701KiB/s |
32k | 6872 | 107MiB/s | 565 | 9054KiB/s |
64k | 6433 | 101MiB/s | 662 | 10.4MiB/s |
128k | 6900 | 108MiB/s | 751 | 11.7MiB/s |
256k | 12.1k | 189MiB/s | 696 | 10.9MiB/s |
512k | 12.3k | 192MiB/s | 537 | 8601KiB/s |
1024k | 6127 | 95.7MiB/s | 605 | 9687KiB/s |
bs=64k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 1448 | 90.5MiB/s | 347 | 21.7MiB/s |
16k | 2481 | 155MiB/s | 466 | 29.1MiB/s |
32k | 3596 | 225MiB/s | 557 | 34.8MiB/s |
64k | 6579 | 411MiB/s | 526 | 32.9MiB/s |
128k | 2727 | 170MiB/s | 659 | 41.2MiB/s |
256k | 2070 | 129MiB/s | 637 | 39.8MiB/s |
512k | 2521 | 158MiB/s | 600 | 37.5MiB/s |
1024k | 2117 | 132MiB/s | 782 | 48.9MiB/s |
Random Write:
bs=4k | fsync=0 | fsync=1 | ||
volblocksize | RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
8k | 1807 | 7232KiB/s | 912 | 3648KiB/s |
16k | 1666 | 6664KiB/s | 808 | 3236KiB/s |
32k | 1491 | 5966KiB/s | 1028 | 4113KiB/s |
64k | 795 | 3183KiB/s | 1059 | 4238KiB/s |
128k | 693 | 2776KiB/s | 912 | 3649KiB/s |
256k | 700 | 2803KiB/s | 827 | 3311KiB/s |
512k | 811 | 3247KiB/s | 559 | 2237KiB/s |
1024k | 614 | 2459KiB/s | 525 | 2104KiB/s |
bs=16k | fsync=0 | fsync=1 | ||
volblocksize | RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
8k | 5131 | 80.2MiB/s | 533 | 8535KiB/s |
16k | 9395 | 147MiB/s | 430 | 6894KiB/s |
32k | 1417 | 22.1MiB/s | 536 | 8586KiB/s |
64k | 787 | 12.3MiB/s | 605 | 9686KiB/s |
128k | 730 | 11.4MiB/s | 709 | 11.1MiB/s |
256k | 689 | 10.8MiB/s | 776 | 12.1MiB/s |
512k | 828 | 12.9MiB/s | 513 | 8214KiB/s |
1024k | 558 | 8931KiB/s | 464 | 7437KiB/s |
bs=64k | fsync=0 | fsync=1 | ||
volblocksize | RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
8k | 1673 | 105MiB/s | 378 | 23.7MiB/s |
16k | 2374 | 148MiB/s | 411 | 25.7MiB/s |
32k | 3326 | 208MiB/s | 437 | 27.3MiB/s |
64k | 7525 | 470MiB/s | 490 | 30.7MiB/s |
128k | 571 | 35.7MiB/s | 634 | 39.7MiB/s |
256k | 609 | 38.1MiB/s | 615 | 38.5MiB/s |
512k | 726 | 45.4MiB/s | 399 | 24.0MiB/s |
1024k | 567 | 35.5MiB/s | 343 | 21.4MiB/s |
Read & Random Read Results With & Without Cache:
The cache was occupied little over 80GB in size so presumably it had all the data cached. The used command for cached reads was:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --loops=8 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
and for uncached reads there was no reason to loop multiple times:
fio --name=test --buffer_pattern=0xdeadbeef --buffer_compress_percentage=75 --numjobs=1 --allrandrepeat=1 --ioengine=libaio --filesize=10G --iodepth=64 --direct=1 --buffered=0 --directory=VOLUME_MOUNTPOINT
With cache:
Cache: MK0100GCTYU | ||||
bs=4k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 34.9k | 136MiB/s | 36.6k | 143MiB/s |
16k | 37.6k | 147MiB/s | 37.8k | 148MiB/s |
32k | 41.2k | 161MiB/s | 10.2k | 39.8MiB/s |
64k | 18.0k | 70.3MiB/s | 16.3k | 63.9MiB/s |
128k | 37.1k | 145MiB/s | 11.4k | 44.5MiB/s |
256k | 29.1k | 114MiB/s | 8000 | 31.3MiB/s |
512k | 37.5k | 147MiB/s | 4790 | 18.7MiB/s |
1024k | 45.7k | 179MiB/s | 2597 | 10.1MiB/s |
Cache: MK0100GCTYU | ||||
bs=16k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 19.9k | 312MiB/s | 25.9k | 405MiB/s |
16k | 33.5k | 523MiB/s | 33.5k | 523MiB/s |
32k | 26.3k | 410MiB/s | 33.1k | 517MiB/s |
64k | 6226 | 97.3MiB/s | 24.7k | 385MiB/s |
128k | 14.3k | 224MiB/s | 15.5k | 242MiB/s |
256k | 8718 | 136MiB/s | 8757 | 137MiB/s |
512k | 10.6k | 166MiB/s | 5037 | 78.7MiB/s |
1024k | 16.7k | 261MiB/s | 2679 | 41.9MiB/s |
Cache: MK0100GCTYU | ||||
bs=64k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 4111 | 257MiB/s | 7099 | 444MiB/s |
16k | 8012 | 501MiB/s | 10.1k | 628MiB/s |
32k | 12.9k | 805MiB/s | 14.5k | 909MiB/s |
64k | 15.1k | 947MiB/s | 18.9k | 1182MiB/s |
128k | 9527 | 595MiB/s | 13.5k | 841MiB/s |
256k | 3116 | 195MiB/s | 8363 | 523MiB/s |
512k | 4257 | 266MiB/s | 5099 | 319MiB/s |
1024k | 5018 | 314MiB/s | 2780 | 174MiB/s |
Without Cache:
No Cache | ||||
bs=4k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 21.9k | 85.4MiB/s | 671 | 2685KiB/s |
16k | 31.3k | 122MiB/s | 663 | 2655KiB/s |
32k | 37.8k | 148MiB/s | 776 | 3105KiB/s |
64k | 17.0k | 70.2MiB/s | 619 | 2477KiB/s |
128k | 36.4k | 142MiB/s | 536 | 2144KiB/s |
256k | 29.5k | 115MiB/s | 511 | 2047KiB/s |
512k | 36.8k | 144MiB/s | 555 | 2221KiB/s |
1024k | 44.3k | 173MiB/s | 540 | 2161KiB/s |
bs=16k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 5209 | 81.4MiB/s | 338 | 5414KiB/s |
16k | 6419 | 100MiB/s | 601 | 9618KiB/s |
32k | 13.1k | 205MiB/s | 796 | 12.4MiB/s |
64k | 5020 | 78.4MiB/s | 609 | 9748KiB/s |
128k | 11.5k | 180MiB/s | 528 | 8458KiB/s |
256k | 7690 | 120MiB/s | 509 | 8154KiB/s |
512k | 9765 | 153MiB/s | 555 | 8884KiB/s |
1024k | 16.3k | 254MiB/s | 537 | 8599KiB/s |
bs=64k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 2030 | 127MiB/s | 303 | 18.9MiB/s |
16k | 2392 | 150MiB/s | 528 | 33.0MiB/s |
32k | 3782 | 236MiB/s | 647 | 40.5MiB/s |
64k | 1199 | 74.0MiB/s | 477 | 29.8MiB/s |
128k | 3034 | 190MiB/s | 549 | 34.4MiB/s |
256k | 2148 | 134MiB/s | 554 | 34.7MiB/s |
512k | 2711 | 169MiB/s | 610 | 38.1MiB/s |
1024k | 4303 | 269MiB/s | 596 | 37.3MiB/s |
Write & Random Write Results (with and without fsync) + SLOG:
Same command as the previous write tests was used with and without fsync and with the addition of SLOG device.
Write: The sequential write operations became slightly slower however random write operation performance doubled.
SLOG: INTEL SSDSC2KG480G8 | ||||
bs=4k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 12.8k | 49.8MiB/s | 2037 | 8150KiB/s |
16k | 16.6k | 64.0MiB/s | 2058 | 8235KiB/s |
32k | 16.9k | 65.0MiB/s | 2026 | 8106KiB/s |
64k | 12.1k | 47.2MiB/s | 2083 | 8335KiB/s |
128k | 15.7k | 61.3MiB/s | 2106 | 8425KiB/s |
256k | 15.3k | 59.6MiB/s | 2070 | 8282KiB/s |
512k | 13.8k | 53.9MiB/s | 2090 | 8360KiB/s |
1024k | 13.0k | 54.6MiB/s | 2048 | 8193KiB/s |
bs=16k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 6428 | 100MiB/s | 1714 | 26.8MiB/s |
16k | 15.1k | 236MiB/s | 1602 | 25.0MiB/s |
32k | 6937 | 108MiB/s | 1679 | 26.2MiB/s |
64k | 5425 | 84.8MiB/s | 1725 | 26.0MiB/s |
128k | 6905 | 108MiB/s | 1715 | 26.8MiB/s |
256k | 5712 | 89.3MiB/s | 1692 | 26.4MiB/s |
512k | 6892 | 108MiB/s | 1747 | 27.3MiB/s |
1024k | 7012 | 110MiB/s | 1693 | 26.5MiB/s |
bs=64k | fsync=0 | fsync=1 | ||
volblocksize | Write IOPS | Write BW | Write IOPS | Write BW |
8k | 1576 | 98.5MiB/s | 1096 | 68.5MiB/s |
16k | 3479 | 217MiB/s | 1060 | 66.3MiB/s |
32k | 4856 | 304MiB/s | 1068 | 66.8MiB/s |
64k | 5336 | 334MiB/s | 1089 | 68.1MiB/s |
128k | 1872 | 117MiB/s | 1057 | 66.1MiB/s |
256k | 1752 | 110MiB/s | 1089 | 68.1MiB/s |
512k | 2279 | 142MiB/s | 1140 | 71.3MiB/s |
1024k | 2571 | 161MiB/s | 1140 | 71.3MiB/s |
Random Write
fsync=0 | fsync=1 | ||
RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
1031 | 4126KiB/s | 1543 | 6175KiB/s |
1148 | 4594KiB/s | 1725 | 6900KiB/s |
1059 | 4239KiB/s | 1468 | 5874KiB/s |
1015 | 4061KiB/s | 1398 | 5595KiB/s |
896 | 3586KiB/s | 1236 | 4947KiB/s |
676 | 2708KiB/s | 1085 | 4342KiB/s |
679 | 2719KiB/s | 868 | 3475KiB/s |
501 | 2008KiB/s | 651 | 2604KiB/s |
fsync=0 | fsync=1 | ||
RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
5057 | 79.0MiB/s | 1699 | 26.6MiB/s |
12.5k | 195MiB/s | 1631 | 25.5MiB/s |
1798 | 28.1MiB/s | 1807 | 28.2MiB/s |
1440 | 22.5MiB/s | 1575 | 24.6MiB/s |
995 | 15.6MiB/s | 1413 | 22.1MiB/s |
580 | 9293KiB/s | 1066 | 16.7MiB/s |
632 | 9.88MiB/s | 946 | 14.8MiB/s |
560 | 8976KiB/s | 655 | 10.2MiB/s |
fsync=0 | fsync=1 | ||
RandWrite IOPS | RandWrite BW | RandWrite IOPS | RandWrite BW |
1481 | 92.6MiB/s | 1000 | 62.5MiB/s |
2820 | 176MiB/s | 1075 | 67.2MiB/s |
4817 | 301MiB/s | 1068 | 66.8MiB/s |
5240 | 328MiB/s | 1112 | 69.5MiB/s |
941 | 58.9MiB/s | 1109 | 69.3MiB/s |
638 | 39.9MiB/s | 925 | 57.8MiB/s |
764 | 47.8MiB/s | 886 | 55.4MiB/s |
591 | 36.9MiB/s | 607 | 37.0MiB/s |
Read Results With 1GB ARC + With & Without 100GB L2ARC
No Cache | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 21.9k | 85.4MiB/s | 671 | 2685KiB/s |
16k | 31.3k | 122MiB/s | 663 | 2655KiB/s |
32k | 37.8k | 148MiB/s | 776 | 3105KiB/s |
64k | 17.0k | 70.2MiB/s | 619 | 2477KiB/s |
128k | 36.4k | 142MiB/s | 536 | 2144KiB/s |
256k | 29.5k | 115MiB/s | 511 | 2047KiB/s |
512k | 36.8k | 144MiB/s | 555 | 2221KiB/s |
1024k | 44.3k | 173MiB/s | 540 | 2161KiB/s |
No Cache | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 5209 | 81.4MiB/s | 338 | 5414KiB/s |
16k | 6419 | 100MiB/s | 601 | 9618KiB/s |
32k | 13.1k | 205MiB/s | 796 | 12.4MiB/s |
64k | 5020 | 78.4MiB/s | 609 | 9748KiB/s |
128k | 11.5k | 180MiB/s | 528 | 8458KiB/s |
256k | 7690 | 120MiB/s | 509 | 8154KiB/s |
512k | 9765 | 153MiB/s | 555 | 8884KiB/s |
1024k | 16.3k | 254MiB/s | 537 | 8599KiB/s |
No Cache | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 2030 | 127MiB/s | 303 | 18.9MiB/s |
16k | 2392 | 150MiB/s | 528 | 33.0MiB/s |
32k | 3782 | 236MiB/s | 647 | 40.5MiB/s |
64k | 1199 | 74.0MiB/s | 477 | 29.8MiB/s |
128k | 3034 | 190MiB/s | 549 | 34.4MiB/s |
256k | 2148 | 134MiB/s | 554 | 34.7MiB/s |
512k | 2711 | 169MiB/s | 610 | 38.1MiB/s |
1024k | 4303 | 269MiB/s | 596 | 37.3MiB/s |
A SATA SSD drive was used for L2ARC cache. Cache is helping random read tremendously.
Cache: MK0100GCTYU | ||||
bs=4k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 34.9k | 136MiB/s | 36.6k | 143MiB/s |
16k | 37.6k | 147MiB/s | 37.8k | 148MiB/s |
32k | 41.2k | 161MiB/s | 10.2k | 39.8MiB/s |
64k | 18.0k | 70.3MiB/s | 16.3k | 63.9MiB/s |
128k | 37.1k | 145MiB/s | 11.4k | 44.5MiB/s |
256k | 29.1k | 114MiB/s | 8000 | 31.3MiB/s |
512k | 37.5k | 147MiB/s | 4790 | 18.7MiB/s |
1024k | 45.7k | 179MiB/s | 2597 | 10.1MiB/s |
Cache: MK0100GCTYU | ||||
bs=16k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 19.9k | 312MiB/s | 25.9k | 405MiB/s |
16k | 33.5k | 523MiB/s | 33.5k | 523MiB/s |
32k | 26.3k | 410MiB/s | 33.1k | 517MiB/s |
64k | 6226 | 97.3MiB/s | 24.7k | 385MiB/s |
128k | 14.3k | 224MiB/s | 15.5k | 242MiB/s |
256k | 8718 | 136MiB/s | 8757 | 137MiB/s |
512k | 10.6k | 166MiB/s | 5037 | 78.7MiB/s |
1024k | 16.7k | 261MiB/s | 2679 | 41.9MiB/s |
Cache: MK0100GCTYU | ||||
bs=64k | ||||
volblocksize | Read IOPS | Read BW | RandRead IOPS | RandRead BW |
8k | 4111 | 257MiB/s | 7099 | 444MiB/s |
16k | 8012 | 501MiB/s | 10.1k | 628MiB/s |
32k | 12.9k | 805MiB/s | 14.5k | 909MiB/s |
64k | 15.1k | 947MiB/s | 18.9k | 1182MiB/s |
128k | 9527 | 595MiB/s | 13.5k | 841MiB/s |
256k | 3116 | 195MiB/s | 8363 | 523MiB/s |
512k | 4257 | 266MiB/s | 5099 | 319MiB/s |
1024k | 5018 | 314MiB/s | 2780 | 174MiB/s |
No comments:
Post a Comment