[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
Ingo Molnar wrote: * Ingo Molnar <mingo@xxxxxxx> wrote:Times I believe are in nanoseconds for lmbench, anyway lower is better.Ouch, that looks unacceptably expensive. All the major distros turn CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express promise to have no measurable runtime overhead.non pv AVG=464.22 STD=5.56 paravirt AVG=502.87 STD=7.36Nearly 10% performance drop here, which is quite a bit... hopefully people are testing the speed of their PV implementations against non-PV bare metal :)Here are some more precise stats done via hw counters on a perfcounters kernel using 'timec', running a modified version of the 'mmap performance stress-test' app i made years ago.The MM benchmark app can be downloaded from: http://redhat.com/~mingo/misc/mmap-perf.c timec.c can be picked up from: http://redhat.com/~mingo/perfcounters/timec.cmmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and touches the mapped area as well with a certain chance. The patterns are pseudo-random and the random seed is initialized to the same value so repeated runs produce the exact same mmap sequence.I ran the test with a single thread and bound to a single core: # taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1[ I ran it as root - so that kernel-space hardware-counter statistics are included as well. ]The results are quite surprisingly candid about the true costs of paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y):----------------------------------------------- | Performance counter stats for './mmap-perf' | ----------------------------------------------- | || x86-defconfig | PARAVIRT=y |------------------------------------------------------------------| | 1311.554526 | 1360.624932 task clock ticks (msecs) +3.74% | | | 1 | 1 CPU migrations | 91 | 79 context switches | 55945 | 55943 pagefaults | ............................................ | 3781392474 | 3918777174 CPU cycles +3.63% | 1957153827 | 2161280486 instructions +10.43% !! | 50234816 | 51303520 cache references +2.12% | 5428258 | 5583728 cache misses +2.86% Is this I or D, or combined? | | | 1314.782469 | 1363.694447 time elapsed (msecs) +3.72% | | -----------------------------------The most surprising element is that in the paravirt_ops case we run 204 million more instructions - out of the ~2000 million instructions total.That's an increase of over 10%! Yow! That's pretty awful. We knew that static instruction count was up, but wouldn't have thought that it would hit the dynamic instruction count so much... I think there are some immediate tweaks we can make to the code generated for each call site, which will help to an extent. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |