[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Performance overhead of paravirt_ops on native identified



On Wed, 2009-05-13 at 18:10 -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > So, what's the fix?
> > 
> > Paravirt patching turns all the pvops calls into direct calls, so
> > _spin_lock etc do end up having direct calls.  For example, the compiler
> > generated code for paravirtualized _spin_lock is:
> > 
> > <_spin_lock+0>:             mov    %gs:0xb4c8,%rax
> > <_spin_lock+9>:             incl   0xffffffffffffe044(%rax)
> > <_spin_lock+15>:    callq  *0xffffffff805a5b30
> > <_spin_lock+22>:    retq
> > 
> > The indirect call will get patched to:
> > <_spin_lock+0>:             mov    %gs:0xb4c8,%rax
> > <_spin_lock+9>:             incl   0xffffffffffffe044(%rax)
> > <_spin_lock+15>:    callq <__ticket_spin_lock>
> > <_spin_lock+20>:    nop; nop                /* or whatever 2-byte nop */
> > <_spin_lock+22>:    retq
> > 
> > One possibility is to inline _spin_lock, etc, when building an
> > optimised kernel (ie, when there's no spinlock/preempt
> > instrumentation/debugging enabled).  That will remove the outer
> > call/return pair, returning the instruction stream to a single
> > call/return, which will presumably execute the same as the non-pvops
> > case.  The downsides arel 1) it will replicate the
> > preempt_disable/enable code at eack lock/unlock callsite; this code is
> > fairly small, but not nothing; and 2) the spinlock definitions are
> > already a very heavily tangled mass of #ifdefs and other preprocessor
> > magic, and making any changes will be non-trivial.
> > 
> 
> The other obvious option, it would seem to me, would be to eliminate the
> *inner* call/return pair, i.e. merging the _spin_lock setup code in with
> the internals of each available implementation (in the case above,
> __ticket_spin_lock).  This is effectively what happens on native.  The
> one problem with that is that every callsite now becomes a patching target.
> 
> That brings me to a somewhat half-arsed thought I have been walking
> around with for a while.
> 
> Consider a paravirt -- or for that matter any other call which is
> runtime-static; this isn't just limited to paravirt -- function which
> looks to the C compiler just like any other external function -- no
> indirection.  We can point it by default to a function which is really
> just an indirect jump to the appropriate handler, that handles the
> prepatching case.  However, a linktime pass over vmlinux.o can find all
> the points where this function is called, and turn it into a list of
> patch sites(*).  The advantages are:
> 
> 1. [minor] no additional nop padding due to indirect function calls.
> 2. [major] no need for a ton of wrapper macros manifest in the code.
> 
> paravirt_ops that turn into pure inline code in the native case is
> obviously another ball of wax entirely; there inline assembly wrappers
> are simply unavoidable.
> 
>       -hpa
> 
> (*) if patching code on SMP was cheaper, we could actually do this
> lazily, and wouldn't have to store a list of patch sites.  I don't feel
> brave enough to go down that route.

This sounds remarkably like what the dynamic function call tracer does.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.