[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access
>Thanks for your patch. I have to say, this looks a lot less terrifying than I >thought it would. :) I am glad to hear that. I too feared the worst :-) On that note many thanks to you and Jan for taking the time for giving valuable feedback. >> Shadow mem_access mode >> ---------------------- >> Add a new shadow mode for mem_access. This should only be enabled by a >> mem_access listener when it calls >> XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE >> for a PV domain. > >Do we need a specific mode for this? If the default behaviour is to allow all >accesses until told otherwise, then it can be enabled at all times. You might be right. I will look in to getting rid of it. >> P2M changes >> ----------- >> Add a new p2m implementation for mem_access. Central to this is the >> access lookup table. This is an array of mfns. Each mfn is a page >> allocated from the domain's shadow memory. The pages hold the >> p2m_access_t values for each guest gmfn. p2m_mem_access_set_entry() >> sets the access value of the mfn given as input and blows the shadow >> entries for the mfn. p2m_mem_access_get_entry() returns the access >> value of the mfn given as input. > >I think this array needs to be a trie, at least; it may make sense to reuse the >p2m_pt implementation and just restrict/ignore the types and MFNs. >That way you won't have to reinvent the wheel, esp. around avoiding long- >running operations. I like the idea you mentioned below of stashing the access type in the shadow_flags field of struct page_info. If that does not work out I will look in to reusing the p2m_pt implementation. >> +/* Convert access restrictions to page table flags */ void >> +p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access) { >> + >> + /* Restrict with access permissions */ >> + switch (access) >> + { >> + case p2m_access_r: >> + *flags &= ~(_PAGE_RW); >> + *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT); > >IIUC this function is called to add further restrictions to an existing set of >flags. >In that case, it should never set the _PAGE_PRESENT bit. Rather, it should >only ever _reduce_ permissions (i.e. set _NX or clear other bits). Then... > >> + break; >> + case p2m_access_rx: >> + case p2m_access_rx2rw: >> + *flags &= ~(_PAGE_NX_BIT|_PAGE_RW); >> + *flags |= _PAGE_PRESENT; >> + break; >> + case p2m_access_rw: >> + *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT); >> + break; >> + case p2m_access_rwx: >> + default: >> + *flags &= ~(_PAGE_NX_BIT); >> + *flags |= (_PAGE_RW|_PAGE_PRESENT); >> + break; >> + } >> + >> + // Allow more restrictive guest flags to be propagated instead of access >> + // permissions >> + if ( !(gflags & _PAGE_RW) ) >> + *flags &= ~(_PAGE_RW); >> + if ( gflags & _PAGE_NX_BIT ) >> + *flags |= _PAGE_NX_BIT; > >..you won't need these either. That makes sense. I will follow your suggestion. >> +} >> + >> +/* >> + * Set the page permission of the mfn. This in effect removes all >> +shadow >> + * mappings of that mfn. The access type of that mfn is stored in the >> +access >> + * lookup table. >> + */ >> +static int >> +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long >gfn, mfn_t mfn, >> + unsigned int page_order, p2m_type_t p2mt, >> + p2m_access_t p2ma) { >> + struct domain *d = p2m->domain; >> + mfn_t *access_lookup_table = p2m->access_lookup_table; >> + uint table_idx; >> + uint page_idx; >> + uint8_t *access_table_page; >> + >> + ASSERT(shadow_mode_mem_access(d) && access_lookup_table != >NULL); >> + >> + /* For PV domains we only support rw, rx, rx2rw, rwx access permissions >*/ >> + if ( unlikely(p2ma != p2m_access_r && >> + p2ma != p2m_access_rw && >> + p2ma != p2m_access_rx && >> + p2ma != p2m_access_rwx && >> + p2ma != p2m_access_rx2rw) ) >> + return -EINVAL; >> + >> + if ( page_get_owner(mfn_to_page(mfn)) != d ) >> + return -ENOENT; > >If we're only ever setting access permissions for a guest's own pages (i.e. not >setting them for grant mappings or foreign mappings) then the access >permissions lookup table could potentially be shared among all PV VMs. It >might even be possible to fold the access-type into the shadow_flags field in >struct page_info and avoid having a lookup table at all. If that's possible, I >think it will make some thing a _lot_ easier. This idea looks very promising. I remember mentioning in this is some of my earlier emails but got side tracked thinking PV memory is contiguous and I could use a simple indexed lookup table. So I will resubmit this patch series after I implement a new version of p2m-ma using the shadow_flags or reusing the p2m-pt implementation. >> - /* Sanity check the call */ >> - if ( d == current->domain || (d->arch.paging.mode & mode) == mode ) >> + /* >> + * Sanity check the call. >> + * Do not allow shadow mem_access mode to be enabled when >> + * log-dirty or translate mode is enabled. > >Why not log-dirty? That would mean you can't use mem-access and live >migration at the same time. I was afraid they would conflict and prevent log-dirty from working correctly. But I forgot about it being used during live-migration. Let me see if I can get both of them to live together. >> + /* Propagate access permissions */ >> + if ( unlikely((level == 1) && >> + mem_event_check_ring(&d->mem_event->access) && >> + !sh_mfn_is_a_page_table(target_mfn)) ) >> + { >> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + p2m_access_t a; >> + p2m_type_t t; >> + mfn_t mfn; >> + mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, >> + NULL); > >Ah, no. This access lookup ought to happen as part of the original p2m lookup >that produced target_mfn. Having a second lookup here is unpleasant and >confusing -- this code keeps the distinction between gfn and mfn pretty >strictly, so passing an mfn into a p2m lookup is wrong, even though you >happen to know they're the same namespace in this PV guest. > >Of course, if you can hide the access type in the struct page_info, then it's >correct to recover it here. OK, I am hoping to do that here. >> + /* >> + * Do not police writes to guest memory emanating from the Xen >> + * kernel. Trying to do so will cause the same pagefault to >> occur >> + * over and over again with an event being sent to the access >> + * listener for each fault. If the access listener's vcpu is not >> + * scheduled during this time, the violation is never resolved >> and >> + * will eventually end with the host crashing. >> + */ >> + if ( (violation && access_w) && >> + (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) >> ) >> + { >> + violation = 0; >> + rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K, >> + p2m_ram_rw, p2m_access_rw); >> + } > >That's exploitable, surely: the guest just needs to make a hypercall with a >chosen buffer address to give itself _rw access to a page. >(Which I guess is OK for your intended use of rw<->rx but not in >general.) I agree but could not come up with a way to get it to work. I have sent a separate email regarding this issue. Cheers, Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |