...

Because Iperf is an client-server application, you have to install iperf on both machines involved in tests. Make sure that you use iperf 2.0.2 using p-threads or newer due to some multi-threading issues with older versions. You can check the version of this installed tools with a following command:

No Format
[:userathostname ~]$ iperf -v iperf version 2.0.2 (03 May 2005) pthreads

...

In order to enable MTU 9000 on the machine network interfaces you may use ifconfig command.

No Format
[root@hostname ~]$ ifconfig eth1 mtu 9000

...

If jumbo frame are working properly, you should be able to ping one host from another using large MTU:

No Format
[root@hostname ~]$ ping 10.0.0.1 -s 8960

...

To tune your link you should measure the average Round Trip Time (RTT) between machines. RTT can be obtained by multiplying the value returned by a ping command by 2. When you have RTT measured, you can set TCO read and write buffers sizes. There are three values you can set: minimum, initial and maximum buffer size. The theoretical value (in bytes) for initial buffer size is BPS / 8 * RTT, where BPS is the link bandwidth in bits/second. Example commands that set these values for the whole operating system are:

No Format
[root@hostname ~]# sysctl -w net.ipv4.tcp_rmem="4096 500000 1000000" [root@hostname ~]# sysctl -w net.ipv4.tcp_wmem="4096 500000 1000000"

...

You can also experiment with maximum socket buffer sizes:

No Format
[root@hostname ~]# sysctl -w net.core.rmem_max=1000000 [root@hostname ~]# sysctl -w net.core.wmem_max=1000000

Another options that should boost performance are:

No Format


 [root@hostname ~]# sysctl -w net.ipv4.tcp_no_metrics_save=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_moderate_rcvbuf=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_window_scaling=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_moderate_rcvbuf=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_sack=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_fack=1
 [root@hostname ~]# sysctl -w net.ipv4.tcp_dsack=1

...

To perform the test, we should run iperf in server mode in one host:

No Format
[root@hostname ~]# iperf -s -M $mss

On the other host we should run command like this:

No Format
[root@hostname ~]# iperf -c $serwer -M $mss -P $threads -w $\{window\} -i $interval -t $test_time

...

To generate key-pair we use a following command:

No Format
[root@hostname ~]# ssh-keygen -t dsa

Then we copy the public key to the remote server and add it to authorized keys file:

No Format
[root@hostname ~]# cat identity.pub >> /home/sarevok/.ssh/authorized_keys

...

There is a simple shell script to run iperf test:

Code Block

	shell
	shell


 #!/bin/sh
 file_size=41
 dst_path=/home/stas/iperf_results
 script_path=/root
 curr_date=`date +%m-%d-%y-%H-%M-%S`
 serwer="10.0.1.1"
 user="root"
 test_time=60
 interval=1
 mss=1460
 window=1000000
 min_threads=1
 max_threads=128

Code Block

	shell
	shell


 for threads in 1 2 4 8 16 32 64 80 96 112 128 ; do
 	ssh $user@$serwer $script_path/run_iperf.sh -s -w $\{window\}  -M $mss &
 	ssh $user@$serwer $script_path/run_vmstat 1 vmstat-$window-$threads-$mss-$curr_date &
 	vmstat 1 > $dst_path/vmstat-$window-$threads-$mss-$curr_date &
         
 	iperf -c $serwer -M $mss -P $threads -w $\{window\} -i $interval -t $test_time   >> $dst_path/iperf-$window-$threads-$mss-$curr_date

No Format
ps ax \| grep vmstat \| awk '\{print $1\}' \| xargs -i kill \{\} 2&>/dev/null ssh $user@$serwer $script_path/kill_iperf_vmstat.sh done

Script run_iperf.sh can look like this:

No Format
#\!/bin/sh

No Format
iperf $1 $2 $3 &

run_vmstat.sh script can contain:

No Format
#\!/bin/sh vmstat $1 > $2 &

kill_iperf_vmstat.sh may look like this:

No Format
#\!/bin/sh ps -elf \| egrep "iperf" \| egrep -v "egrep" \|awk '\{print $4\}' \| xargs -i kill -9 \{\} ps -elf \| egrep "vmstat" \| egrep -v "egrep" \|awk '\{print $4\}' \| xargs -i kill -9 \{\}

To start test script that can ignoring hangup signals, you can use nohup command.

No Format
[:stasatworm]$ nohup script.sh &

...

Here, we present how to make software raid structure using Linux md tool. To create simple raid level from devices sda1 sda2 sda3 sda4 you should use following command:

No Format
mdadm --create --verbose /dev/md1 --spare-devices=0 --level=0 --raid-devices=4 /dev/\{sda1, sda2, sda3, sda4\}

...

When you create a RAID structure you should be able to see some RAID details similar to the informations shown below:

No Format


 [root@sarevok bin]# mdadm --detail /dev/md1
 /dev/md1:
         Version : 00.90.03
   Creation Time : Mon Apr  6 17:41:43 2009
      Raid Level : raid0
      Array Size : 6837337472 (6520.59 GiB 7001.43 GB)
    Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 3
     Persistence : Superblock is persistent

No Format
Update Time : Mon Apr 6 17:41:43 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0

No Format
Chunk Size : 64K

No Format
Rebuild Status : 10% complete

No Format
UUID : 19450624:f6490625:aa77982e:0d41d013 Events : 0.1

No Format
Number Major Minor RaidDevice State 0 65 16 0 active sync /dev/sda1 1 65 32 1 active sync /dev/sda2 2 65 48 2 active sync /dev/sda3 3 65 64 3 active sync /dev/sda4

...

To create the file system on md device and then mount it to some directory we use command like this:

No Format
mkfs.ext3 /dev/md1 mount /dev/md1 /mnt/md1

...

The idea of dd is to copy file from 'if' location to 'of' location. Using this tool to measure disk devices requires some trick. To measure write speed you read data from /dev/zero to file on the tested device. For measuring the read performance you should read the data from the file on tested device and write it to /dev/zero.
In that way we avoid measuring more that one storage system at a time. To measure time of reading or writing the file we use time tool. The example commands to test write and read 32 GB of data are:

for writing performance (please note using the sync command before and during the benchmark, so you are not measuring your operating system's cache performance) :

No Format
[root@sarevok ~]# sync; time (dd if=/dev/zero of=/mnt/md1/test_file.txt bs=1024M count=32; sync)

and for reading performance:

No Format
[root@sarevok ~]# time dd if=/mnt/md1/test_file.txt of=/dev/zero bs=1024M count=32

...

To perform one round of the test we can use command:

No Format
iozone -T -t $threads -r $\{blocksize\}k -s $\{file_size\}G -i 0 -i 1 -i 2 -c -e

...

To automate the testing we can write some simple SH script like this:

No Format
#\!/bin/sh dst_path=/home/sarevok/wyniki_test_iozone curr_date=`date +%m-%d-%y-%H-%M-%S`

No Format
file_size=128 min_blocksize=1 max_blocksize=32

No Format
min_queuedepth=1 max_queuedepth=16

No Format
mkdir $dst_path cd /mnt/sdaw/

No Format
blocksize=$min_blocksize while [: _blocksize -le $max_blocksize ]; do queuedepth=$min_queuedepth while [: _queuedepth -le $max_queuedepth ]; do

No Format
vmstat 1 > $dst_path/vmstat-$blocksize-$queuedepth-$curr_date /root/iozone -T -t $queuedepth -r $\{blocksize\}k -s $\{file_size\}G -i 0 -i 1 -c -e > $dst_path/iozone-$blocksize-$queuedepth-$curr_date

No Format
ps ax \| grep vmstat \| awk '\{print $1\}' \| xargs -i kill \{\} 2&>/dev/null

No Format
queuedepth=`expr $queuedepth \* 2` file_size=`expr $file_size \/ 2` done blocksize=`expr $blocksize \* 2` done

...

Links:

Practical information:

Real life benchmark requirements in RFPs:

One of the most common usages of storage benchmarking is making sure that the storage systems you buy meets your requirements. As always there are practical limits how complex the benchmark can be. This section lists benchmark procedures actually used in tenders.

CESNET - ~400TB disk array for HSM system using both FC and SATA disks (2011)

Brief disk array requirements:

Two types of disk in one disk array, no automatic tiering within the array required (there was an HSM system for doing this on a file level)
Tier 1 - FC, SAS or SCSI drives, min. 15k RPM, totally min. 50TB consisting of 120x 600GB drives + 3 hot spares
Tier 2 - SATA drives, min. 7.2.k RPM, totally min. 300TB, min. 375x1TB + 10 hot spares OR 188x2TB + 5 hot spares

Performance requirements:

Sequential: there will be 10TB cluster filesystem on the disk array using RAID5 or R6, this file system will be part of the HSM system. This filesystem will connected to one of front end servers (technical solutions of the connection is up to the candidates, e.g. MPIO, # FC channels, etc., but the solution must be identical to what is used in the proposal). The following benchmark will be run using iozone v3.347:

iozone -Mce -t200 -s15g -r512k -i0 -i1 -F path_to_files

The result of the test is an average value of three runs of the abovementioned command as „Children see throughput for 200 initial writers”, respektive, „Children see throughput for 200 readers”.
Minimum read speed 1600MB/s, minimum write sped 1200MB/s.

Random:

Same setup of the volume as in the sequential test, but for this test, it will be connected without any filesystem (on a block level). The following test will be run on the connected LUN using fio v1.4.1 with this test definition:

No Format

[global]
description=CESNET_test
[cesnet]
# change it to name of the block device used
filename=XXXX
rw=randrw
# 70% rand read, 30% rand write
rwmixread=70
size=10000g
ioengine=libaio
bs=2k
runtime=8h
time_based
numjobs=32
group_reporting
# --- end ---

The result of the test is sum of write and read IO operations divided by total elapsed time of the test in seconds.

Minimum required performance 9000 IOPs.

File system benchmarking examples:

...

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Links:

Practical information:

Real life benchmark requirements in RFPs:

CESNET - ~400TB disk array for HSM system using both FC and SATA disks (2011)

File system benchmarking examples:

Page tree

Page History

Versions Compared

Old Version 8

New Version 9

Key

Links:

Practical information:

Real life benchmark requirements in RFPs:

CESNET - ~400TB disk array for HSM system using both FC and SATA disks (2011)

File system benchmarking examples: