[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Overview of work required to implement mem_access for PV guests

>>> The mem_access APIs only work with HVM guests that run on Intel
>> hardware with EPT support. This effort is to enable it for PV guests that run
>> with shadow page tables. To facilitate this, the following will be done:
>>> 1. A magic page will be created for the mem_access (mem_event) ring
>>> buffer during the PV domain creation.
>> As Andrew pointed out, you might have to be careful about this -- if the page
>> is owned by the domain itself, and it can find out (or guess) its MFN, it can
>> map and write to it.  You might need to allocate an anonymous page for this?
> Do you mean allocate an anonymous page in dom0 and use that? Won't we run in 
> to the problem Andres was mentioning a while back?
> http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc
> Or were you meaning something else?
> I was planning on doing exactly what we do in the mem_access listener for HVM 
> guests. The magic page is mapped in and then removed from physmap of the 
> guest.
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333

Once the page is removed from the physmap, an hvm guest has no way of indexing 
that page and thus mapping it -- even though it's a page that belongs to it, 
and that it's threaded on its list of pages owned.

WIth PV, you have an additional means of indexing, which is the raw MFN. The PV 
guest will be able to get at the page because it owns it, if it knows the MFN. 
No PFN/GFN required. This is how, for example, things like the grant table are 
mapped in classic PV domains.

I don't know how realistic is the concern about the domain guessing the MFN for 
the page. But if it can, and it maps it and mucks with the ring, the thing to 
evaluate is: can the guest throw dom0/host into a tailspin? The answer is 
likely "no", because guests can't reasonably do this with other rings they have 
access to, like PV driver backends. But a flaw on the consumer side of mem 
events could yield a vector for DoS.

If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there is 
no way for the PV domain to map it.

>> From my reading of xc_domain_decrease_reservation_exact(), I think it will 
>> also work for PV guests. Or am I missing something here? 
>>> 2. Most of the mem_event / mem_access functions and variable name are
>>> HVM specific. Given that I am enabling it for PV; I will change the
>>> names to something more generic. This also holds for the mem_access
>>> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to
>>> make them a memory op or a domctl.
>> Sure.
>>> 3. A new shadow option will be added called PG_mem_access. This mode
>>> is basic shadow mode with the addition of a table that will track the
>>> access permissions of each page in the guest.
>>> mem_access_tracker[gfmn] = access_type If there is a place where I can
>>> stash this in an existing structure, please point me at it.
>> My suggestion was that you should make another implementation of the
>> p2m.h interface, which is already called in all the right places.  You might 
>> want
>> to borrow the tree-building code from the existing p2m-pt.c, though there's
>> no reason why your table should be structured as a pagetable.  The important
>> detail is that you should be using memory from the shadow pool to hold this
>> datastructure.
> OK, I will go down the path. I agree that my table needn't be structured as a 
> pagetable. The other thing I was thinking about is stashing the access 
> information in the per mfn page_info structures. Or is that memory overhead 
> too much of an overkill?

Well, the page/MFN could conceivably be mapped by many domains. There are ample 
bits to play with in the type flag, for example. But as long as you don't care 
about mem_event on pages shared across two or more PV domains, then that should 
be fine. I wouldn't blame you if you didn't care :)

OTOH, all you need is a byte per pfn, and the great thing is that in PV 
domains, the physmap is bounded and continuous. Unlike HVM and its PCI holes, 
etc, which demand the sparse tree structure. So you can allocate an easily 
indexable array, notwithstanding super page concerns (I think/hope).

>>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start
>>> pfn/gmfn is ~0ull, it takes it as a request to set default access.
>>> Here we will call shadow_blow_tables() after recording the default
>>> access type for the domain. In the mode where it is setting mem_access
>>> type for individual gmfns, we will call a function that will drop the
>>> shadow for that individual gmfn. I am not sure which function to call.
>>> Will sh_remove_all_mappings(gmfn) do the trick?
>> Yes, sh_remove_all_mappings() is the one you want.
>>> The other issue here is that in the HVM case we could use
>>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn
>>> to gfn+nr would be set. This won't be possible in the PV case as we
>>> are actually dealing with mfns and mfn to mfn+nr need not belong to
>>> the same guest. But given that setting *all* page access permissions
>>> are done implicitly when setting default access, I think we can live
>>> with setting page permissions one at a time as they are faulted in.
>> Seems OK to me.
>>> 8. In sh_page_fault() perform access checks similar to
>>> ept_handle_violation() / hvm_hap_nested_page_fault().
>> Yep.
>>> 9. Hook in to _sh_propagate() and set up the L1 entries based on
>>> access permissions. This will be similar to ept_p2m_type_to_flags(). I
>>> think I might also have to hook in to the code that emulates page
>>> table writes to ensure access permissions are honored there too.
>> I guess you might; again, the p2m interface will help here, and probably the
>> exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook.
> Thanks so much for the feedback.
> Aravindh

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.