WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] x86: add SSE-based copy_page()

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, "Cui, Dexuan" <dexuan.cui@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] x86: add SSE-based copy_page()
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Tue, 13 Jan 2009 08:13:27 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 13 Jan 2009 00:13:36 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1590c7eb-4c3f-4712-a4ab-c61ce305096e@default>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acl1Vs/BYB8aRA8KHkW2NfRBl3Earw==
Thread-topic: [Xen-devel] [PATCH] x86: add SSE-based copy_page()
User-agent: Microsoft-Entourage/12.15.0.081119
On 12/01/2009 23:29, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> wrote:

> I finally got around to measuring this.  On my two machines,
> an Intel "Weybridge" box and an Intel TBD quadcore box,
> the new sse2 code was at best nearly the same for cold cache
> and much worse for warm cache.
> 
> I can't explain the sampling variation as I have interrupts off,
> a lock held, and pre-warmed TLB... I suppose maybe another
> processor could be causing rare TLB misses?  But in any case
> the min number is probably best for comparison.
> 
> I'm guessing the gcc optimizer for the memcpy code was tuned
> for an Intel pipeline... Jan, were you measuring on an
> AMD processor?
> 
> I've included the raw data and measurement code below.

Seems like unless we dynamically choose the copy routine, we're better off
without the SSE2 alternative. Shall I revert it then?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel