WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] [PATCH] x86: add SSE-based copy_page()

To: Jan Beulich <jbeulich@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] [PATCH] x86: add SSE-based copy_page()
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Wed, 12 Nov 2008 17:17:08 +0000 (GMT)
Cc:
Delivery-date: Wed, 12 Nov 2008 09:18:21 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <491AFDC2.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> From: Jan Beulich [mailto:jbeulich@xxxxxxxxxx]
> 
> >>> Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> 12.11.08 15:51 >>>
> >I assume the 12% faster is on a benchmark...
> 
> It's the win for an application doing nothing but dirtying 
> private mappings
> of a file. That seemed like the least overhead test that 
> wouldn't require any
> special testing code in kernel or hypervisor.
> 
> >Have you measured how much faster the copy_page_sse2
> >routine (standalond) is than the memcpy?  Is it a
> >factor of 2?
> 
> No, I didn't.

Hmmm... I'm working on a project that does extensive page-copying
so was eager to give it a spin on two test machines, one a Core 2 Duo
("Weybridge"), the other an as-yet-unreleased Intel box.  I measured
the routine with rdtsc, took many thousands of samples, and
look at the smallest measurement.  The hypervisor measured is
64-bit so "cpu_has_xmm2" appears to always be true.

On the first machine, the change to use sse2 instructions
made no difference.  On the second machine, using sse2 actually
made copy_page() *worse* (by 30-40%).

I'm poor enough with the x86 instruction set that I can't explain
my results, but thought I would report them.  I'm not doubting that
you saw improvements on your box, just noting that YMMV.

Perhaps someone from Intel familiar with the microarchitectures
might be able to explain (and can query me offlist to identify
the as-yet-unreleased box).

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel