[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XenPPC] copy_page speedup using dcbz on target
If you really want to explore mem/page copy for XenPPC then you have to understand that since we run without an MMU, profiling code with MMU on, _including_ RMA, is not helpful because the access is guarded (G=1, I=0). For more information see 970FX UM Sections: 6.3.8.4 Loads in Real Mode 6.3.9.4 Stores in Real ModeYou will probably find that grouping (as Hollis suggests) by cache line will be much better. but also prefetch the next line somehow. Please run your experiments _in_ Xen,and use timebase (ticks) or NOW () (nanosecs) to model it. On Dec 15, 2006, at 6:31 PM, Hollis Blanchard wrote: On Fri, 2006-12-15 at 17:50 -0500, poff wrote:3) Useful when PPC must do page copies in place of 'page flipping'.So you're saying we should worry about it later?For the future, copy_page using dcbz: diff -r 7669fca80bfc xen/arch/powerpc/mm.c --- a/xen/arch/powerpc/mm.c Mon Dec 04 11:46:53 2006 -0500 +++ b/xen/arch/powerpc/mm.c Fri Dec 15 17:52:58 2006 -0500 @@ -280,7 +280,8 @@ extern void copy_page(void *dp, void *sp if (on_systemsim()) { systemsim_memcpy(dp, sp, PAGE_SIZE); } else { - memcpy(dp, sp, PAGE_SIZE); + clear_page(dp); + __copy_page(dp, sp); } } diff -r 7669fca80bfc xen/include/asm-powerpc/page.h --- a/xen/include/asm-powerpc/page.h Mon Dec 04 11:46:53 2006 -0500 +++ b/xen/include/asm-powerpc/page.h Fri Dec 15 17:52:58 2006 -0500 @@ -90,6 +90,25 @@ 1: dcbz 0,%0\n\ extern void copy_page(void *dp, void *sp); +static __inline__ void __copy_page(void *dp, void *sp) +{ + ulong dwords, dword_size; + + dword_size = 8; + dwords = (PAGE_SIZE / dword_size) - 1; + + __asm__ __volatile__( + "mtctr %2 # copy_page\n\ + ld %2,0(%1)\n\ + std %2,0(%0)\n\ +1: ldu %2,8(%1)\n\ + stdu %2,8(%0)\n\ + bdnz 1b" + : /* no result */ + : "r" (dp), "r" (sp), "r" (dwords) + : "%ctr", "memory"); +} +I'd rather have copy_page() dcbz; stdu; stdu; stdu; ... stdu; in each loop iteration. It would also be nice to improve memcpy, though that one is certainly more difficult due to alignment, varying lengths, etc. Out current memcpy() comes from memcpy.S which is straight from linux, its not the best, but prolly good enuff. Perhaps we can borrow code from http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html This tunes for usermode. I don't think its performance is relevant. -JX _______________________________________________ Xen-ppc-devel mailing list Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ppc-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |