[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XenPPC] copy_page speedup using dcbz on target
On Sat, 16 Dec 2006 11:34, Jimi Xenidis wrote: > If you really want to explore mem/page copy for XenPPC then you have > to understand that since we run without an MMU, profiling code with > MMU on, _including_ RMA, is not helpful because the access is guarded ... > Please run your experiments _in_ Xen ... Timing code has been included in Xen, setup.c; however, results match prior timings in userspace: JS20: elapsed time: 0x000000000000a8f5 elapsed time using dcbz: 0x0000000000005410 elapsed time: 0x000000000000a987 elapsed time using dcbz: 0x0000000000005361 JS21: elapsed time: 0x0000000000000862 elapsed time using dcbz: 0x0000000000000420 elapsed time: 0x0000000000000859 elapsed time using dcbz: 0x0000000000000424 ............................................... > You will probably find that grouping (as Hollis suggests) by cache > line will be much better. but also prefetch the next line somehow. Somewhat better... (following observations were made running in user space) The unrolling the copy loop (by cache line) improves performance a few percent. (did not record the time; also unrolled loop still used same number of registers and no touching) However, including dcbz at beginning of loop slowed things down. Perhaps need to dcbz a couple lines ahead of usage? _______________________________________________ Xen-ppc-devel mailing list Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ppc-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |