WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Performance overhead of paravirt_ops on native identifie

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: [Xen-devel] Re: Performance overhead of paravirt_ops on native identified
From: "H. Peter Anvin" <hpa@xxxxxxxxx>
Date: Wed, 13 May 2009 18:10:52 -0700
Cc: Nick Piggin <npiggin@xxxxxxx>, "Xin, Xiaohui" <xiaohui.xin@xxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, "Li, Xin" <xin.li@xxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, Ingo Molnar <mingo@xxxxxxx>
Delivery-date: Wed, 13 May 2009 18:11:33 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A0B62F7.5030802@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4A0B62F7.5030802@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Jeremy Fitzhardinge wrote:
> 
> So, what's the fix?
> 
> Paravirt patching turns all the pvops calls into direct calls, so
> _spin_lock etc do end up having direct calls.  For example, the compiler
> generated code for paravirtualized _spin_lock is:
> 
> <_spin_lock+0>:               mov    %gs:0xb4c8,%rax
> <_spin_lock+9>:               incl   0xffffffffffffe044(%rax)
> <_spin_lock+15>:      callq  *0xffffffff805a5b30
> <_spin_lock+22>:      retq
> 
> The indirect call will get patched to:
> <_spin_lock+0>:               mov    %gs:0xb4c8,%rax
> <_spin_lock+9>:               incl   0xffffffffffffe044(%rax)
> <_spin_lock+15>:      callq <__ticket_spin_lock>
> <_spin_lock+20>:      nop; nop                /* or whatever 2-byte nop */
> <_spin_lock+22>:      retq
> 
> One possibility is to inline _spin_lock, etc, when building an
> optimised kernel (ie, when there's no spinlock/preempt
> instrumentation/debugging enabled).  That will remove the outer
> call/return pair, returning the instruction stream to a single
> call/return, which will presumably execute the same as the non-pvops
> case.  The downsides arel 1) it will replicate the
> preempt_disable/enable code at eack lock/unlock callsite; this code is
> fairly small, but not nothing; and 2) the spinlock definitions are
> already a very heavily tangled mass of #ifdefs and other preprocessor
> magic, and making any changes will be non-trivial.
> 

The other obvious option, it would seem to me, would be to eliminate the
*inner* call/return pair, i.e. merging the _spin_lock setup code in with
the internals of each available implementation (in the case above,
__ticket_spin_lock).  This is effectively what happens on native.  The
one problem with that is that every callsite now becomes a patching target.

That brings me to a somewhat half-arsed thought I have been walking
around with for a while.

Consider a paravirt -- or for that matter any other call which is
runtime-static; this isn't just limited to paravirt -- function which
looks to the C compiler just like any other external function -- no
indirection.  We can point it by default to a function which is really
just an indirect jump to the appropriate handler, that handles the
prepatching case.  However, a linktime pass over vmlinux.o can find all
the points where this function is called, and turn it into a list of
patch sites(*).  The advantages are:

1. [minor] no additional nop padding due to indirect function calls.
2. [major] no need for a ton of wrapper macros manifest in the code.

paravirt_ops that turn into pure inline code in the native case is
obviously another ball of wax entirely; there inline assembly wrappers
are simply unavoidable.

        -hpa

(*) if patching code on SMP was cheaper, we could actually do this
lazily, and wouldn't have to store a list of patch sites.  I don't feel
brave enough to go down that route.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel