Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access

To: Aravindh Puthiyaparambil <aravindp@xxxxxxxxx>

From: Tim Deegan <tim@xxxxxxx>

Date: Thu, 1 May 2014 16:39:20 +0200

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Keir Fraser <keir@xxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Thu, 01 May 2014 14:39:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi, Thanks for your patch. I have to say, this looks a lot less terrifying than I thought it would. :) At 21:45 -0700 on 28 Apr (1398717902), Aravindh Puthiyaparambil wrote: > Shadow mem_access mode > ---------------------- > Add a new shadow mode for mem_access. This should only be enabled by a > mem_access listener when it calls XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE > for a PV domain. Do we need a specific mode for this? If the default behaviour is to allow all accesses until told otherwise, then it can be enabled at all times. > P2M changes > ----------- > Add a new p2m implementation for mem_access. Central to this is the > access lookup table. This is an array of mfns. Each mfn is a page > allocated from the domain's shadow memory. The pages hold the > p2m_access_t values for each guest gmfn. p2m_mem_access_set_entry() > sets the access value of the mfn given as input and blows the shadow > entries for the mfn. p2m_mem_access_get_entry() returns the access > value of the mfn given as input. I think this array needs to be a trie, at least; it may make sense to reuse the p2m_pt implementation and just restrict/ignore the types and MFNs. That way you won't have to reinvent the wheel, esp. around avoiding long-running operations. > +/* Convert access restrictions to page table flags */ > +void p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access) > +{ > + > + /* Restrict with access permissions */ > + switch (access) > + { > + case p2m_access_r: > + *flags &= ~(_PAGE_RW); > + *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT); IIUC this function is called to add further restrictions to an existing set of flags. In that case, it should never set the _PAGE_PRESENT bit. Rather, it should only ever _reduce_ permissions (i.e. set _NX or clear other bits). Then... > + break; > + case p2m_access_rx: > + case p2m_access_rx2rw: > + *flags &= ~(_PAGE_NX_BIT|_PAGE_RW); > + *flags |= _PAGE_PRESENT; > + break; > + case p2m_access_rw: > + *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT); > + break; > + case p2m_access_rwx: > + default: > + *flags &= ~(_PAGE_NX_BIT); > + *flags |= (_PAGE_RW|_PAGE_PRESENT); > + break; > + } > + > + // Allow more restrictive guest flags to be propagated instead of access > + // permissions > + if ( !(gflags & _PAGE_RW) ) > + *flags &= ~(_PAGE_RW); > + if ( gflags & _PAGE_NX_BIT ) > + *flags |= _PAGE_NX_BIT; ..you won't need these either. > +} > + > +/* > + * Set the page permission of the mfn. This in effect removes all shadow > + * mappings of that mfn. The access type of that mfn is stored in the access > + * lookup table. > + */ > +static int > +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t > mfn, > + unsigned int page_order, p2m_type_t p2mt, > + p2m_access_t p2ma) > +{ > + struct domain *d = p2m->domain; > + mfn_t *access_lookup_table = p2m->access_lookup_table; > + uint table_idx; > + uint page_idx; > + uint8_t *access_table_page; > + > + ASSERT(shadow_mode_mem_access(d) && access_lookup_table != NULL); > + > + /* For PV domains we only support rw, rx, rx2rw, rwx access permissions > */ > + if ( unlikely(p2ma != p2m_access_r && > + p2ma != p2m_access_rw && > + p2ma != p2m_access_rx && > + p2ma != p2m_access_rwx && > + p2ma != p2m_access_rx2rw) ) > + return -EINVAL; > + > + if ( page_get_owner(mfn_to_page(mfn)) != d ) > + return -ENOENT; If we're only ever setting access permissions for a guest's own pages (i.e. not setting them for grant mappings or foreign mappings) then the access permissions lookup table could potentially be shared among all PV VMs. It might even be possible to fold the access-type into the shadow_flags field in struct page_info and avoid having a lookup table at all. If that's possible, I think it will make some thing a _lot_ easier. (Based on this, and the worries about gfn-space being sparse, I'm not going to give detailed feedback about the implementation of the lookup table in this version; Jan's already pointed out some flaws there.) > + gfn = get_gpfn_from_mfn(mfn_x(mfn)); Beware: the guest can set this m2p entry to any value it likes... > + /* > + * Values with the MSB set denote MFNs that aren't really part of the > + * domain's pseudo-physical memory map (e.g., the shared info frame). > + * Nothing to do here. > + */ > + if ( unlikely(!VALID_M2P(gfn)) ) > + return 0; ...effectively preventing the tools from setting the access type. That's also relevant below where you use _rw access for any frame with a bogus gfn. > + if ( gfn > (d->tot_pages - 1) ) > + return -EINVAL; Yeah, as others have pointed out, that's not a good check even if the guest is well-behaved. > @@ -3179,8 +3183,14 @@ static int shadow_one_bit_enable(struct domain *d, u32 > mode) > { > ASSERT(paging_locked_by_me(d)); > > - /* Sanity check the call */ > - if ( d == current->domain || (d->arch.paging.mode & mode) == mode ) > + /* > + * Sanity check the call. > + * Do not allow shadow mem_access mode to be enabled when > + * log-dirty or translate mode is enabled. Why not log-dirty? That would mean you can't use mem-access and live migration at the same time. > --- a/xen/arch/x86/mm/shadow/multi.c > +++ b/xen/arch/x86/mm/shadow/multi.c > @@ -38,6 +38,8 @@ > #include <asm/hvm/cacheattr.h> > #include <asm/mtrr.h> > #include <asm/guest_pt.h> > +#include <asm/mem_event.h> > +#include <asm/mem_access.h> > #include <public/sched.h> > #include "private.h" > #include "types.h" > @@ -625,6 +627,20 @@ _sh_propagate(struct vcpu *v, > } > } > > + /* Propagate access permissions */ > + if ( unlikely((level == 1) && > + mem_event_check_ring(&d->mem_event->access) && > + !sh_mfn_is_a_page_table(target_mfn)) ) > + { > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + p2m_access_t a; > + p2m_type_t t; > + mfn_t mfn; > + mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, NULL); Ah, no. This access lookup ought to happen as part of the original p2m lookup that produced target_mfn. Having a second lookup here is unpleasant and confusing -- this code keeps the distinction between gfn and mfn pretty strictly, so passing an mfn into a p2m lookup is wrong, even though you happen to know they're the same namespace in this PV guest. Of course, if you can hide the access type in the struct page_info, then it's correct to recover it here. > + /* > + * Do not police writes to guest memory emanating from the Xen > + * kernel. Trying to do so will cause the same pagefault to occur > + * over and over again with an event being sent to the access > + * listener for each fault. If the access listener's vcpu is not > + * scheduled during this time, the violation is never resolved > and > + * will eventually end with the host crashing. > + */ > + if ( (violation && access_w) && > + (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) ) > + { > + violation = 0; > + rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K, > + p2m_ram_rw, p2m_access_rw); > + } That's exploitable, surely: the guest just needs to make a hypercall with a chosen buffer address to give itself _rw access to a page. (Which I guess is OK for your intended use of rw<->rx but not in general.) Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.