[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] x86: add SSE-based copy_page()



> From: Jan Beulich [mailto:jbeulich@xxxxxxxxxx]
> 
> >>> Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> 12.11.08 15:51 >>>
> >I assume the 12% faster is on a benchmark...
> 
> It's the win for an application doing nothing but dirtying 
> private mappings
> of a file. That seemed like the least overhead test that 
> wouldn't require any
> special testing code in kernel or hypervisor.
> 
> >Have you measured how much faster the copy_page_sse2
> >routine (standalond) is than the memcpy?  Is it a
> >factor of 2?
> 
> No, I didn't.

Hmmm... I'm working on a project that does extensive page-copying
so was eager to give it a spin on two test machines, one a Core 2 Duo
("Weybridge"), the other an as-yet-unreleased Intel box.  I measured
the routine with rdtsc, took many thousands of samples, and
look at the smallest measurement.  The hypervisor measured is
64-bit so "cpu_has_xmm2" appears to always be true.

On the first machine, the change to use sse2 instructions
made no difference.  On the second machine, using sse2 actually
made copy_page() *worse* (by 30-40%).

I'm poor enough with the x86 instruction set that I can't explain
my results, but thought I would report them.  I'm not doubting that
you saw improvements on your box, just noting that YMMV.

Perhaps someone from Intel familiar with the microarchitectures
might be able to explain (and can query me offlist to identify
the as-yet-unreleased box).

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.