Xen project Mailing List

[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>

Date: Wed, 17 Nov 2010 14:47:47 -0500

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Delivery-date: Thu, 18 Nov 2010 02:02:35 -0800

Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= lagarcavilla.org; b=ZuaVVoqh3a2uvKrWUROqvuMmsxO2wBSq+1EqH/vKiC/G FrZgE1rrxHogpm8HGift7neNTs+b0/WQ/kFOyHoMAIjaCxvRRlqI/P33ACrPgpje rtNJ9pgjQtoE/wsQLG/jhJJlqeIk0jFLME9e6zBB3HfH7rGU0lVst9ImAbytzSk=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

So, swapping mfns for write requests is a definite no-no. One could still live with copying write buffers and swapping read buffers by the end of the request. That still yields some benefit. As for kernel mappings, I though a solution would be to provide the hypervisor with both pte pointers. After all pte pointers are already provided for mapping grants in user-space. But that's a little too much to handle for the current interface. Thanks for the feedback Andres On Nov 17, 2010, at 12:52 PM, Jeremy Fitzhardinge wrote: > On 11/17/2010 08:36 AM, Andres Lagar-Cavilla wrote: >> I'll throw an idea there and you educate me why it's lame. >> >> Going back to the primary issue of dropping zero-copy, you want the block >> backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, >> because you run into all sorts of quirkiness otherwise: magical VM_FOREIGN >> incantations to back granted mfn's with fake page structs that make >> get_user_pages happy, quirky grant PTEs, etc. >> >> Ok, so how about something along the lines of GNTTABOP_swap? Eerily >> reminiscent of (maligned?) GNTTABOP_transfer, but hear me out. >> >> The observation is that for a blkfront read, you could do the read all along >> on a regular dom0 frame, and when stuffing the response into the ring, swap >> the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then >> the algorithm folds out: >> >> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, >> creates a pool of reserved regular pages via get_free_page(s). These pages >> have their refcount pumped, no one in dom0 will ever touch them. >> >> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap >> immediately. One of the backend-reserved mfn's is swapped with the domU mfn. >> Pfn's and page struct's on both ends remain untouched. > > Would GNTTABOP_swap also require the domU to have already unmapped the > page from its own pagetables? Presumably it would fail if it didn't, > otherwise you'd end up with a domU mapping the same mfn as a > dom0-private page. > >> 3. For blkfront reads, call swap when stuffing the response back into the >> ring >> >> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, >> much like balloon and others do, without fear of races. More importantly, b) >> you don't have a weirdo granted PTE, or work with a frame from other domain. >> It's your page all along, dom0 >> >> 5. One assumption for domU is that pages allocated as blkfront buffers won't >> be touched by anybody, so a) it's safe for them to swap async with another >> frame with undef contents and b) domU can fix its p2m (and kvaddr) when >> pulling responses from the ring (the new mfn should be put on the response >> by dom0 directly or through an opaque handle) >> >> 6. Scatter-gather vectors in ring requests give you a natural multicall >> batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen >> as often and at the granularity as skbuff's demanded for GNTTABOP_transfer >> >> 7. Potentially domU may want to use the contents in a blkfront write buffer >> later for something else. So it's not really zero-copy. But the approach >> opens a window to async memcpy . From the point of swap when pulling the req >> to the point of pushing the response, you can do memcpy at any time. Don't >> know about how practical that is though. > > I think that will be the common case - the kernel will always attempt to > write dirty pagecache pages to make clean ones, and it will still want > them around to access. So it can't really give up the page altogether; > if it hands it over to dom0, it needs to make a local copy first. > >> Problems at first glance: >> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and >> blkback. >> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like >> balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken >> care of. domU will probably need to neuter its kvaddr before granting, and >> then re-establish it when the response arrives. Weren't all these hypercalls >> ultimately more expensive than memcpy for GNTABOP_transfer for netback? >> 3. Managing the pool of backend reserved pages may be a problem? >> >> So in the end, perhaps more of an academic exercise than a palatable answer, >> but nonetheless I'd like to hear other problems people may find with this >> approach > > It's not clear to me that its any improvement over just directly copying > the data up front. > > J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.