WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Performance overhead of paravirt_ops on native identifie

To: "H. Peter Anvin" <hpa@xxxxxxxxx>
Subject: [Xen-devel] Re: Performance overhead of paravirt_ops on native identified
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu, 14 May 2009 10:25:06 +0200
Cc: Nick Piggin <npiggin@xxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Xin, Xiaohui" <xiaohui.xin@xxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, rostedt <rostedt@xxxxxxxxxxxxxxxx>, "Li, Xin" <xin.li@xxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, Ingo Molnar <mingo@xxxxxxx>
Delivery-date: Fri, 15 May 2009 06:31:24 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A0B6F9C.4060405@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4A0B62F7.5030802@xxxxxxxx> <4A0B6F9C.4060405@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Wed, 2009-05-13 at 18:10 -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > So, what's the fix?
> > 
> > Paravirt patching turns all the pvops calls into direct calls, so
> > _spin_lock etc do end up having direct calls.  For example, the compiler
> > generated code for paravirtualized _spin_lock is:
> > 
> > <_spin_lock+0>:             mov    %gs:0xb4c8,%rax
> > <_spin_lock+9>:             incl   0xffffffffffffe044(%rax)
> > <_spin_lock+15>:    callq  *0xffffffff805a5b30
> > <_spin_lock+22>:    retq
> > 
> > The indirect call will get patched to:
> > <_spin_lock+0>:             mov    %gs:0xb4c8,%rax
> > <_spin_lock+9>:             incl   0xffffffffffffe044(%rax)
> > <_spin_lock+15>:    callq <__ticket_spin_lock>
> > <_spin_lock+20>:    nop; nop                /* or whatever 2-byte nop */
> > <_spin_lock+22>:    retq
> > 
> > One possibility is to inline _spin_lock, etc, when building an
> > optimised kernel (ie, when there's no spinlock/preempt
> > instrumentation/debugging enabled).  That will remove the outer
> > call/return pair, returning the instruction stream to a single
> > call/return, which will presumably execute the same as the non-pvops
> > case.  The downsides arel 1) it will replicate the
> > preempt_disable/enable code at eack lock/unlock callsite; this code is
> > fairly small, but not nothing; and 2) the spinlock definitions are
> > already a very heavily tangled mass of #ifdefs and other preprocessor
> > magic, and making any changes will be non-trivial.
> > 
> 
> The other obvious option, it would seem to me, would be to eliminate the
> *inner* call/return pair, i.e. merging the _spin_lock setup code in with
> the internals of each available implementation (in the case above,
> __ticket_spin_lock).  This is effectively what happens on native.  The
> one problem with that is that every callsite now becomes a patching target.
> 
> That brings me to a somewhat half-arsed thought I have been walking
> around with for a while.
> 
> Consider a paravirt -- or for that matter any other call which is
> runtime-static; this isn't just limited to paravirt -- function which
> looks to the C compiler just like any other external function -- no
> indirection.  We can point it by default to a function which is really
> just an indirect jump to the appropriate handler, that handles the
> prepatching case.  However, a linktime pass over vmlinux.o can find all
> the points where this function is called, and turn it into a list of
> patch sites(*).  The advantages are:
> 
> 1. [minor] no additional nop padding due to indirect function calls.
> 2. [major] no need for a ton of wrapper macros manifest in the code.
> 
> paravirt_ops that turn into pure inline code in the native case is
> obviously another ball of wax entirely; there inline assembly wrappers
> are simply unavoidable.
> 
>       -hpa
> 
> (*) if patching code on SMP was cheaper, we could actually do this
> lazily, and wouldn't have to store a list of patch sites.  I don't feel
> brave enough to go down that route.

This sounds remarkably like what the dynamic function call tracer does.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel