KVM Autotest Performance Regression Testing

KVM autotest performance regression Testing (unfinished)

Goal

Automate all performance testing to save human resource
Prepare environment automatically, reduce personal error
Make the result more stable by netperf demo-mode, repeat tests
Process raw results to a standard format for restoring & comparing easily
Using statistical method (eg. T-test) to compute average/p-value for comparing results exactly

Performance subtests

network

netperf (linux, windows is requested)
ntttcp (windows )

block

iozone (linux & windows ) (iozone has its own result analysis module)
iometer (windows) (not push upstream)
ffsb (linux)
qemu_io (host): (not push upstream)

Environment setup

Framework support

Autotest already supports prepare environment for performance testing, guest & host need to reboot after setting up.

setup script (redhat style)

Autotest supports to numa pining. Assign "numanode=-1" in tests.cfg, then vcpu threads/vhost_net threads/VM memory will be pined to last numa node.

If you want to pin other processes to numa node, you can use numctl and taskset.

memory: numactl -m $n $cmdline
cpu: taskset $node_mask $thread_id

Manual guide

You don't need this if you use Autotest framework.

1.First level pinning would be to use numa pinning when starting the guest.
e.g  numactl -c 1 -m 1 qemu-kvm  -smp 2 -m 4G <> (pinning guest memory and cpus to numa-node 1)

2.For a single instance test, it would suggest trying a one to one mapping of vcpu to pyhsical core.
e.g
get guest vcpu threads id
#taskset -p 40 $vcpus1  (pinning vcpu1 thread to pyshical cpu #6 )
#taskset -p 80 $vcpus2  (pinning vcpu2 thread to physical cpu #7 )

3.To pin vhost on host. get vhost PID and then use taskset to pin it on the same soket.
e.g
taskset -p 20 $vhost (pinning vcpu2 thread to physical cpu #5 )

4.In guest,pin the IRQ to one core and the netperf to another.
1) make sure irqbalance is off - `service irqbalance stop`
2) find the interrupts - `cat /proc/interrupts`
3) find the affinity mask for the interrupt(s) - `cat /proc/irq/<irq#>/smp_affinity`
4) change the value to match the proper core.make sure the vlaue is cpu mask.
e.g pin the IRQ to first core.
   echo 01>/proc/irq/$virti0-input/smp_affinity
   echo 01>/proc/irq/$virti0-output/smp_affinity
5)pin the netserver to another core.
e.g
taskset -p 02 netserver

5.For host to guest scenario. to get maximum performance. make sure to run netperf on different cores on the same numa node as the guest.
e.g
numactl  -m 1 netperf -T 4 (pinning netperf to physical cpu #4)

Execute testing

Submit jobs in Autotest server, only execute netperf.guset_exhost for three times.

tests.cfg:

only netperf.guest_exhost
variants:
    - repeat1:
    - repeat2:
    - repeat3:
# vbr0 has a static ip: 192.168.100.16
bridge=vbr0
# virbr0 is created by libvirtd, guest nic2 get ip by dhcp
bridge_nic2 = virbr0
# guest nic1 static ip
ip_nic1 = 192.168.100.21
# external host static ip:
client = 192.168.100.15

Result files:

# cd /usr/local/autotest/results/8-debug_user/192.168.122.1/
# find .|grep RHS
kvm.repeat1.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS
kvm.repeat2.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS
kvm.repeat3.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS

Submit same job in another env (different packages) with same configuration

Result files:

# cd /usr/local/autotest/results/9-debug_user/192.168.122.1/
# find .|grep RHS
kvm.repeat1.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS
kvm.repeat2.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS
kvm.repeat3.r61.virtio_blk.smp2.virtio_net.RHEL.6.1.x86_64.netperf.exhost_guest/results/netperf-result.RHS

Analysis result

Config file: perf.conf

[ntttcp]
result_file_pattern = .*.RHS
ignore_col = 1
avg_update =

[netperf] # testname
result_file_pattern = .*.RHS # pattern is used to match result files
ignore_col = 2 # some result is the configuration (eg. packet size), we don't need to compute the average of them
avg_update = 4,2,3|14,5,12|15,6,13 # update col results after computing averages

[iozone]
result_file_pattern =

Execute regression.py to compare two results:

login autotest server by ssh
# cd /usr/local/autotest/client/tools
1) compare with log files
# python regression.py netperf file /usr/local/autotest/results/8-debug_user/192.168.122.1/ /usr/local/autotest/results/9-debug_user/192.168.122.1/
2) compare with results in database
# python regression.py netperf db 8 9

T-test:

scipy: http://www.scipy.org/
t-test: http://en.wikipedia.org/wiki/Student's_t-test
Two python modules (scipy and numpy) are needed.
Script to install numpy/scipy on rhel6 automatically:
  http://kongove.fedorapeople.org/autotest/scripts/install-numpy-scipy.sh

T-test uses Student's T distribution to compute the probability of difference exists.

Unpaired T-test is used to compare significance of two samples (same configuration), user can check p-value (p-value = 1 - significance) to know if regression bug exists. If the difference of two samples is considered to be not statistically significant(p <= 0.05), it will add a '+' or '-' before p-value. ('+': avg_sample1 < avg_sample2, '-': avg_sample1 > avg_sample2)

Paired t-test is used to compute significance of all averages (all configuration).

Regression results:

netperf.html

- Every Avg line represents the average value based on *$n* repetitions of the same test,
  and the following SD line represents the Standard Deviation between the *$n* repetitions.
- The Standard deviation is displayed as a percentage of the average.
- The significance of the differences between the two averages is calculated using unpaired T-test that
  takes into account the SD of the averages.
- The paired t-test is computed for the averages of same category.
- only over 95% confidence results will be added "+/-" in "Significance" part. "+" for cpu-usage means regression, "+" for throughput means improvement.

netperf.avg.html

- Raw data that the averages are based on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly