[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.



On Thu, 2010-11-18 at 08:56 -0500, Ian Campbell wrote:
> On Wed, 2010-11-17 at 19:27 +0000, Daniel Stodden wrote:
> > On Wed, 2010-11-17 at 12:04 -0500, Ian Campbell wrote:
> > > On Tue, 2010-11-16 at 21:28 +0000, Daniel Stodden wrote:
> > > > 
> > > > > > Also, I was absolutely certain I once saw VM_FOREIGN support in
> > > > gntdev..
> > > > > > Can't find it now, what happened? Without, there's presently still
> > > > no
> > > > > > zero-copy.
> > > > > 
> > > > > gntdev doesn't need VM_FOREIGN any more - it uses the (relatively
> > > > > new-ish) mmu notifier infrastructure which is intended to allow a
> > > > device
> > > > > to sync an external MMU with usermode mappings.  We're not using it
> > > > in
> > > > > precisely that way, but it allows us to wrangle grant mappings
> > > > before
> > > > > the generic code tries to do normal pte ops on them.
> > > > 
> > > > The mmu notifiers were for safe teardown only. They are not sufficient
> > > > for DIO, which wants gup() to work. If you want zcopy on gntdev, we'll
> > > > need to back those VMAs with page structs.  Or bounce again (gulp,
> > > > just mentioning it). As with the blktap2 patches, note there is no
> > > > difference in the dom0 memory bill, it takes page frames.
> > > 
> > > I though the VM_FOREIGN stuff which blktap[12] hooks into gup() was
> > > there in order to avoid some deadlock arising from having a single
> > > struct page taking part in nested block I/O. i.e. one I/O created by
> > > blkback and the second from the tapdisk process deadlock against each
> > > other.
> > 
> > 
> > 
> > > I though that in order to work around this blktap creates a second range
> > > of struct pages (the struct vm_foreign_map array in
> > > vma->vm_private_data) which aliases the pages under I/O and then it
> > > hooks gup() to do the switcheroo when necessary.
> > > 
> > > If the blkback interface is running in userspace (whether in a tapdisk
> > > or qemu process) then there is no nesting of I/O and the only I/O is
> > > from process context and therefore this particular issue is no longer a
> > > problem because we can use a properly struct page backed page without
> > > needing a magic VM_FOREIGN shadow of it?
> > > 
> > > Have I misunderstood something about the reason for VM_FOREIGN?
> > 
> > VM_FOREIGN is a special case of grant mapping frames into a user's VMA.
> > 
> > This stuff is primarily there to make gup() grab a pagevec from the VMA
> > struct, instead of relying on follow_page in order to map user-virtual
> > to pseudophysical, which is what gup normally does.
> > 
> > In brief, it's hacking gup() to keep DIO working. 
> > 
> > In other words: Userland can't I/O on some VM_PFNMAP or VM_IO or
> > similar. If you ask for DMA, the kernel quite urgently wants this to
> > look like normal memory.
> > 
> > Your description of the aliasing is correct, but that's yet another
> > thing. It implies redoing the grantmap and p2m entries privately.
> > 
> > To make this clearer: If the aliasing weren't necessary, blktap2 would
> > just have had to grab blkback's prepared page struct from the request SG
> > vector, and .mmap that to userland, but still with VM_FOREIGN and some
> > pagevec pointer in the VMA.
> > 
> > Instead, there's blkback-pagemap, specifically to map that SG page entry
> > *back* to the original gref from a table, and *redo* the entire gntmap +
> > p2m thing another time, privately.
> 
> In the userland blkback case do you need to redo the mapping? Isn't the
> original mapping (including pte update) of the granted mfn into the
> struct page associated with the user address sufficient?

It's completely sufficient. Running the sring in tapdisk straight away
implies mapping once and for all. The present aliasing happens because
of the two arrows in the following request submission chain: blkback ->
tapdev -> disk. Each a designating I/O submissing to a blk request
queue. It's just for the stacking.

> Is the reason gup() doesn't work is that it tries to go from a pte_entry
> to a struct page (via a pfn, e.g. in vm_normal_page) which involves a
> m2p translation of the pte value, and this is not valid for a foreign
> page (since the m2p entry is for the remote domain)?

Exactly. So either it's VM_FOREIGN, or we maybe go for a somewhat
cleaner solution by teaching mfn_to_pfn new tricks, such as going
through a private mfn->gntpfn lookup on the m2p failure path. See the
related thread with Jeremy. I guess you'd have additional thoughts on
that.

> Or is the issue that we try and dynamically fault in those addresses and
> that doesn't work because you need a special set_pte variant (aka the
> gnttab map hypercall) to map a granted page?

Not a real issue. Blktap presently vm_insert()s, but demand paging makes
no difference: gup() completely relies on follow_page. Means that a pte
miss falls through to handle_mm_fault, then retries.

Thx,
Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.