[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.



On Wed, 2010-11-17 at 11:36 -0500, Andres Lagar-Cavilla wrote:
> I'll throw an idea there and you educate me why it's lame.
> 
> Going back to the primary issue of dropping zero-copy, you want the block 
> backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, because 
> you run into all sorts of quirkiness otherwise: magical VM_FOREIGN 
> incantations to back granted mfn's with fake page structs that make 
> get_user_pages happy, quirky grant PTEs, etc.
> 
> Ok, so how about something along the lines of GNTTABOP_swap? Eerily 
> reminiscent of (maligned?) GNTTABOP_transfer, but hear me out.
> 
> The observation is that for a blkfront read, you could do the read all along 
> on a regular dom0 frame, and when stuffing the response into the ring, swap 
> the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then 
> the algorithm folds out:
> 
> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, 
> creates a pool of reserved regular pages via get_free_page(s). These pages 
> have their refcount pumped, no one in dom0 will ever touch them.
> 
> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap 
> immediately. One of the backend-reserved mfn's is swapped with the domU mfn. 
> Pfn's and page struct's on both ends remain untouched.
> 
> 3. For blkfront reads, call swap when stuffing the response back into the ring
> 
> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, much 
> like balloon and others do, without fear of races. More importantly, b) you 
> don't have a weirdo granted PTE, or work with a frame from other domain. It's 
> your page all along, dom0
> 
> 5. One assumption for domU is that pages allocated as blkfront buffers won't 
> be touched by anybody, so a) it's safe for them to swap async with another 
> frame with undef contents and b) domU can fix its p2m (and kvaddr) when 
> pulling responses from the ring (the new mfn should be put on the response by 
> dom0 directly or through an opaque handle)
> 
> 6. Scatter-gather vectors in ring requests give you a natural multicall 
> batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen as 
> often and at the granularity as skbuff's demanded for GNTTABOP_transfer
> 
> 7. Potentially domU may want to use the contents in a blkfront write buffer 
> later for something else. So it's not really zero-copy. But the approach 
> opens a window to async memcpy . From the point of swap when pulling the req 
> to the point of pushing the response, you can do memcpy at any time. Don't 
> know about how practical that is though.
> 
> Problems at first glance:
> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and 
> blkback.
> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like 
> balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken 
> care of. domU will probably need to neuter its kvaddr before granting, and 
> then re-establish it when the response arrives. Weren't all these hypercalls 
> ultimately more expensive than memcpy for GNTABOP_transfer for netback?
> 3. Managing the pool of backend reserved pages may be a problem?

I guess GNT_transfer for network I/O died because of the double-ended
TLB fallout?

Still liked the general direction, nice shot.

Cheers,
Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.