[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT



Ingo Molnar wrote:
* Ingo Molnar <mingo@xxxxxxx> wrote:

Times I believe are in nanoseconds for lmbench, anyway lower is better.

non pv   AVG=464.22 STD=5.56
paravirt AVG=502.87 STD=7.36

Nearly 10% performance drop here, which is quite a bit... hopefully people are testing the speed of their PV implementations against non-PV bare metal :)
Ouch, that looks unacceptably expensive. All the major distros turn CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express promise to have no measurable runtime overhead.

Here are some more precise stats done via hw counters on a perfcounters kernel using 'timec', running a modified version of the 'mmap performance stress-test' app i made years ago.

The MM benchmark app can be downloaded from:

   http://redhat.com/~mingo/misc/mmap-perf.c

timec.c can be picked up from:

   http://redhat.com/~mingo/perfcounters/timec.c

mmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and touches the mapped area as well with a certain chance. The patterns are pseudo-random and the random seed is initialized to the same value so repeated runs produce the exact same mmap sequence.

I ran the test with a single thread and bound to a single core:

  # taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1

[ I ran it as root - so that kernel-space hardware-counter statistics are included as well. ]

The results are quite surprisingly candid about the true costs of paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y):

-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
|                |
| x86-defconfig | PARAVIRT=y |------------------------------------------------------------------
|
|    1311.554526 |  1360.624932  task clock ticks (msecs)    +3.74%
|                |
|              1 |            1  CPU migrations
|             91 |           79  context switches
|          55945 |        55943  pagefaults
|    ............................................
|     3781392474 |   3918777174  CPU cycles                  +3.63%
|     1957153827 |   2161280486  instructions               +10.43%

!!

|       50234816 |     51303520  cache references            +2.12%
|        5428258 |      5583728  cache misses                +2.86%

Is this I or D, or combined?

|                |
|    1314.782469 |  1363.694447  time elapsed (msecs)        +3.72%
|                |
-----------------------------------

The most surprising element is that in the paravirt_ops case we run 204 million more instructions - out of the ~2000 million instructions total.
That's an increase of over 10%!

Yow! That's pretty awful. We knew that static instruction count was up, but wouldn't have thought that it would hit the dynamic instruction count so much...

I think there are some immediate tweaks we can make to the code generated for each call site, which will help to an extent.

   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.