[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.



Hi, 

At 07:24 +0000 on 12 Dec (1418365491), Tian, Kevin wrote:
> > I'm afraid not.  There's nothing worrying per se in a backend knowing
> > the MFNs of the pages -- the worry is that the backend can pass the
> > MFNs to hardware.  If the check happens only at lookup time, then XenGT
> > can (either through a bug or a security breach) just pass _any_ MFN to
> > the GPU for DMA.
> > 
> > But even without considering the security aspects, this model has bugs
> > that may be impossible for XenGT itself to even detect.  E.g.:
> >  1. Guest asks its virtual GPU to DMA to a frame of memory;
> >  2. XenGT looks up the GFN->MFN mapping;
> >  3. Guest balloons out the page;
> >  4. Xen allocates the page to a different guest;
> >  5. XenGT passes the MFN to the GPU, which DMAs to it.
> > 
> > Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
> > underlying memory and make sure it doesn't get reallocated until XenGT
> > is finished with it.
> 
> yes, I see your point. Now we can't support ballooning in VM given above
> reason, and refcnt is required to close that gap.
> 
> but just to confirm one point. from my understanding whether it's a 
> mapping operation doesn't really matter. We can invent an interface
> to get p2m mapping and then increase refcnt. the key is refcnt here.
> when XenGT constructs a shadow GPU page table, it creates a reference
> to guest memory page so the refcnt must be increased. :-)

True. :)  But Xen does need to remember all the refcounts that were
created (so it can tidy up if the domain crashes).  If Xen is already
doing that it might as well do it in the IOMMU tables since that
solves other problems.

> > [First some hopefully-helpful diagrams to explain my thinking.  I'll
> >  borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the
> >  addresses that devices issue their DMAs in:
> 
> what's 'BFN' short for? Bus Frame Number?

Yes, I think so.

> > If we replace that lookup with a _map_ hypercall, either with Xen
> > choosing the BFN (as happens in the PV grant map operation) or with
> > the guest choosing an unused address (as happens in the HVM/PVH
> > grant map operation), then:
> >  - the only extra code in XenGT itself is that you need to unmap
> >    when you change the GTT;
> >  - Xen can track and control exactly which MFNs XenGT/the GPU can access;
> >  - running XenGT in a driver domain or PVH dom0 ought to work; and
> >  - we fix the race condition I described above.
> 
> ok, I see your point here. It does sound like a better design to meet
> Xen hypervisor's security requirement and can also work with PVH
> Dom0 or driver domain. Previously even when we said a MFN is
> required, it's actually a BFN due to IOMMU existence, and it works
> just because we have a 1:1 identity mapping in-place. And by finding
> a BFN
> 
> some follow-up think here:
> 
> - one extra unmap call will have some performance impact, especially
> for media processing workloads where GPU page table modifications
> are hot. but suppose this can be optimized with batch request

Yep.  In general I'd hope that the extra overhead of unmap is small
compared with the trap + emulate + ioreq + schedule that's just
happened.  Though I know that IOTLB shootdowns are potentially rather
expensive right now so it might want some measurement.

> - is there existing _map_ call for this purpose per your knowledge, or
> a new one is required? If the latter, what's the additional logic to be
> implemented there?

For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do
what you need, I think.  For PV, I think we probably need a new map
operation with sensible semantics.  My inclination would be to have it
follow the grant-map semantics (i.e. caller supplies domid + gfn,
hypervisor supplies BFN and success/failure code). 

Malcolm might have opinions about this -- it starts looking like the
sort of PV IOMMU interface he's suggested before. 

> - when you say _map_, do you expect this mapped into dom0's virtual
> address space, or just guest physical space?

For PVH, I mean into guest physical address space (and iommu tables,
since those are the same).  For PV, I mean just the IOMMU tables --
since the guest controls its own PFN space entirely there's nothing
Xen can to map things into it.

> - how is BFN or unused address (what do you mean by address here?)
> allocated? does it need present in guest physical memory at boot time,
> or just finding some holes?

That's really a question for the xen maintainers in the linux kernel.
I presume that whatever bookkeeping they currently do for grant-mapped
memory would suffice here just as well.

> - graphics memory size could be large. starting from BDW, there'll
> be 64bit page table format. Do you see any limitation here on finding
> BFN or address?

Not really.  The IOMMU tables are also 64-bit so there must be enough
addresses to map all of RAM.  There shouldn't be any need for these
mappings to be _contiguous_, btw.  You just need to have one free
address for each mapping.  Again, following how grant maps work, I'd
imagine that PVH guests will allocate an unused GFN for each mapping
and do enough bookkeeping to make sure they don't clash with other GFN
users (grant mapping, ballooning, &c).  PV guests will probably be
given a BFN by the hypervisor at map time (which will be == MFN in
practice) and just needs to pass the same BFN to the unmap call later
(it can store it in the GTT meanwhile).

> > The default policy I'm suggesting is that the XenGT backend domain
> > should be marked IS_PRIV_FOR (or similar) over the XenGT client VMs,
> > which will need a small extension in Xen since at the moment struct
> > domain has only one "target" field.
> 
> Is that connection setup by toolstack or by hypervisor today?

It's set up by the toolstack using XEN_DOMCTL_set_target.  Extending
that to something like XEN_DOMCTL_set_target_list would be OK, I
think, along with some sort of lookup call.  Or maybe an
add_target/remove_target pair would be easier?

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.