Xen project Mailing List

[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>

From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Wed, 17 Nov 2010 15:42:47 -0800

Cc: Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jeremy

Delivery-date: Wed, 17 Nov 2010 15:43:30 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Wed, 2010-11-17 at 11:36 -0500, Andres Lagar-Cavilla wrote: > I'll throw an idea there and you educate me why it's lame. > > Going back to the primary issue of dropping zero-copy, you want the block > backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, because > you run into all sorts of quirkiness otherwise: magical VM_FOREIGN > incantations to back granted mfn's with fake page structs that make > get_user_pages happy, quirky grant PTEs, etc. > > Ok, so how about something along the lines of GNTTABOP_swap? Eerily > reminiscent of (maligned?) GNTTABOP_transfer, but hear me out. > > The observation is that for a blkfront read, you could do the read all along > on a regular dom0 frame, and when stuffing the response into the ring, swap > the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then > the algorithm folds out: > > 1. Block backend, instead of get_empty_pages_and_pagevec at init time, > creates a pool of reserved regular pages via get_free_page(s). These pages > have their refcount pumped, no one in dom0 will ever touch them. > > 2. When extracting a blkfront write from the ring, call GNTTABOP_swap > immediately. One of the backend-reserved mfn's is swapped with the domU mfn. > Pfn's and page struct's on both ends remain untouched. > > 3. For blkfront reads, call swap when stuffing the response back into the ring > > 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, much > like balloon and others do, without fear of races. More importantly, b) you > don't have a weirdo granted PTE, or work with a frame from other domain. It's > your page all along, dom0 > > 5. One assumption for domU is that pages allocated as blkfront buffers won't > be touched by anybody, so a) it's safe for them to swap async with another > frame with undef contents and b) domU can fix its p2m (and kvaddr) when > pulling responses from the ring (the new mfn should be put on the response by > dom0 directly or through an opaque handle) > > 6. Scatter-gather vectors in ring requests give you a natural multicall > batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen as > often and at the granularity as skbuff's demanded for GNTTABOP_transfer > > 7. Potentially domU may want to use the contents in a blkfront write buffer > later for something else. So it's not really zero-copy. But the approach > opens a window to async memcpy . From the point of swap when pulling the req > to the point of pushing the response, you can do memcpy at any time. Don't > know about how practical that is though. > > Problems at first glance: > 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and > blkback. > 2. The kernel vaddr will need to be managed as well by dom0/U. Much like > balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken > care of. domU will probably need to neuter its kvaddr before granting, and > then re-establish it when the response arrives. Weren't all these hypercalls > ultimately more expensive than memcpy for GNTABOP_transfer for netback? > 3. Managing the pool of backend reserved pages may be a problem? I guess GNT_transfer for network I/O died because of the double-ended TLB fallout? Still liked the general direction, nice shot. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.