[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c
Move the following PV specific code to that new file: 1. Three hypercalls that are only available to PV guests: 1. do_mmu_update 2. do_update_va_mapping 3. do_update_va_mapping_otherdomain 2. PV MMIO emulation code 3. PV writable page table emulation code 4. PV grant table creation / destruction code 5. Other supporting code for the above Move everything in one patch because they share a lot of code. Also move the PV page table API comment and delete trailing white spaces. No functional change. Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx> --- xen/arch/x86/mm.c | 1918 +--------------------------------------------- xen/arch/x86/pv/Makefile | 1 + xen/arch/x86/pv/mm.c | 1902 +++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/mm.h | 5 + 4 files changed, 1935 insertions(+), 1891 deletions(-) create mode 100644 xen/arch/x86/pv/mm.c diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 92e79d7fb6..0119cacc43 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -18,71 +18,6 @@ * along with this program; If not, see <http://www.gnu.org/licenses/>. */ -/* - * A description of the x86 page table API: - * - * Domains trap to do_mmu_update with a list of update requests. - * This is a list of (ptr, val) pairs, where the requested operation - * is *ptr = val. - * - * Reference counting of pages: - * ---------------------------- - * Each page has two refcounts: tot_count and type_count. - * - * TOT_COUNT is the obvious reference count. It counts all uses of a - * physical page frame by a domain, including uses as a page directory, - * a page table, or simple mappings via a PTE. This count prevents a - * domain from releasing a frame back to the free pool when it still holds - * a reference to it. - * - * TYPE_COUNT is more subtle. A frame can be put to one of three - * mutually-exclusive uses: it might be used as a page directory, or a - * page table, or it may be mapped writable by the domain [of course, a - * frame may not be used in any of these three ways!]. - * So, type_count is a count of the number of times a frame is being - * referred to in its current incarnation. Therefore, a page can only - * change its type when its type count is zero. - * - * Pinning the page type: - * ---------------------- - * The type of a page can be pinned/unpinned with the commands - * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is, - * pinning is not reference counted, so it can't be nested). - * This is useful to prevent a page's type count falling to zero, at which - * point safety checks would need to be carried out next time the count - * is increased again. - * - * A further note on writable page mappings: - * ----------------------------------------- - * For simplicity, the count of writable mappings for a page may not - * correspond to reality. The 'writable count' is incremented for every - * PTE which maps the page with the _PAGE_RW flag set. However, for - * write access to be possible the page directory entry must also have - * its _PAGE_RW bit set. We do not check this as it complicates the - * reference counting considerably [consider the case of multiple - * directory entries referencing a single page table, some with the RW - * bit set, others not -- it starts getting a bit messy]. - * In normal use, this simplification shouldn't be a problem. - * However, the logic can be added if required. - * - * One more note on read-only page mappings: - * ----------------------------------------- - * We want domains to be able to map pages for read-only access. The - * main reason is that page tables and directories should be readable - * by a domain, but it would not be safe for them to be writable. - * However, domains have free access to rings 1 & 2 of the Intel - * privilege model. In terms of page protection, these are considered - * to be part of 'supervisor mode'. The WP bit in CR0 controls whether - * read-only restrictions are respected in supervisor mode -- if the - * bit is clear then any mapped page is writable. - * - * We get round this by always setting the WP bit and disallowing - * updates to it. This is very unlikely to cause a problem for guest - * OS's, which will generally use the WP bit to simplify copy-on-write - * implementation (in that case, OS wants a fault when it writes to - * an application-supplied buffer). - */ - #include <xen/init.h> #include <xen/kernel.h> #include <xen/lib.h> @@ -127,14 +62,6 @@ l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE) l1_fixmap[L1_PAGETABLE_ENTRIES]; -/* - * PTE updates can be done with ordinary writes except: - * 1. Debug builds get extra checking by using CMPXCHG[8B]. - */ -#if !defined(NDEBUG) -#define PTE_UPDATE_WITH_CMPXCHG -#endif - paddr_t __read_mostly mem_hotplug; /* Private domain structs for DOMID_XEN and DOMID_IO. */ @@ -520,67 +447,6 @@ void update_cr3(struct vcpu *v) make_cr3(v, cr3_mfn); } -/* Get a mapping of a PV guest's l1e for this virtual address. */ -static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn) -{ - l2_pgentry_t l2e; - - ASSERT(!paging_mode_translate(current->domain)); - ASSERT(!paging_mode_external(current->domain)); - - if ( unlikely(!__addr_ok(addr)) ) - return NULL; - - /* Find this l1e and its enclosing l1mfn in the linear map. */ - if ( __copy_from_user(&l2e, - &__linear_l2_table[l2_linear_offset(addr)], - sizeof(l2_pgentry_t)) ) - return NULL; - - /* Check flags that it will be safe to read the l1e. */ - if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT ) - return NULL; - - *gl1mfn = l2e_get_pfn(l2e); - - return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) + - l1_table_offset(addr); -} - -/* Pull down the mapping we got from guest_map_l1e(). */ -static inline void guest_unmap_l1e(void *p) -{ - unmap_domain_page(p); -} - -/* Read a PV guest's l1e that maps this virtual address. */ -static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e) -{ - ASSERT(!paging_mode_translate(current->domain)); - ASSERT(!paging_mode_external(current->domain)); - - if ( unlikely(!__addr_ok(addr)) || - __copy_from_user(eff_l1e, - &__linear_l1_table[l1_linear_offset(addr)], - sizeof(l1_pgentry_t)) ) - *eff_l1e = l1e_empty(); -} - -/* - * Read the guest's l1e that maps this address, from the kernel-mode - * page tables. - */ -static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr, - void *eff_l1e) -{ - bool_t user_mode = !(v->arch.flags & TF_kernel_mode); -#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) - - TOGGLE_MODE(); - guest_get_eff_l1e(addr, eff_l1e); - TOGGLE_MODE(); -} - const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE) zero_page[PAGE_SIZE]; @@ -635,49 +501,6 @@ static int alloc_segdesc_page(struct page_info *page) return i == 512 ? 0 : -EINVAL; } - -/* Map shadow page at offset @off. */ -int map_ldt_shadow_page(unsigned int off) -{ - struct vcpu *v = current; - struct domain *d = v->domain; - unsigned long gmfn; - struct page_info *page; - l1_pgentry_t l1e, nl1e; - unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT); - int okay; - - BUG_ON(unlikely(in_irq())); - - if ( is_pv_32bit_domain(d) ) - gva = (u32)gva; - guest_get_eff_kern_l1e(v, gva, &l1e); - if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) ) - return 0; - - gmfn = l1e_get_pfn(l1e); - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); - if ( unlikely(!page) ) - return 0; - - okay = get_page_type(page, PGT_seg_desc_page); - if ( unlikely(!okay) ) - { - put_page(page); - return 0; - } - - nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW); - - spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); - l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e); - v->arch.pv_vcpu.shadow_ldt_mapcnt++; - spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); - - return 1; -} - - int get_page_from_pagenr(unsigned long page_nr, struct domain *d) { struct page_info *page = mfn_to_page(page_nr); @@ -1744,344 +1567,6 @@ void page_unlock(struct page_info *page) } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x ); } -/* How to write an entry to the guest pagetables. - * Returns 0 for failure (pointer not valid), 1 for success. */ -static inline int update_intpte(intpte_t *p, - intpte_t old, - intpte_t new, - unsigned long mfn, - struct vcpu *v, - int preserve_ad) -{ - int rv = 1; -#ifndef PTE_UPDATE_WITH_CMPXCHG - if ( !preserve_ad ) - { - rv = paging_write_guest_entry(v, p, new, _mfn(mfn)); - } - else -#endif - { - intpte_t t = old; - for ( ; ; ) - { - intpte_t _new = new; - if ( preserve_ad ) - _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY); - - rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn)); - if ( unlikely(rv == 0) ) - { - MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte - ": saw %" PRIpte, old, _new, t); - break; - } - - if ( t == old ) - break; - - /* Allowed to change in Accessed/Dirty flags only. */ - BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); - - old = t; - } - } - return rv; -} - -/* Macro that wraps the appropriate type-changes around update_intpte(). - * Arguments are: type, ptr, old, new, mfn, vcpu */ -#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \ - update_intpte(&_t ## e_get_intpte(*(_p)), \ - _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \ - (_m), (_v), (_ad)) - -/* - * PTE flags that a guest may change without re-validating the PTE. - * All other bits affect translation, caching, or Xen's safety. - */ -#define FASTPATH_FLAG_WHITELIST \ - (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \ - _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER) - -/* Update the L1 entry at pl1e to new value nl1e. */ -static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, - unsigned long gl1mfn, int preserve_ad, - struct vcpu *pt_vcpu, struct domain *pg_dom) -{ - l1_pgentry_t ol1e; - struct domain *pt_dom = pt_vcpu->domain; - int rc = 0; - - if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) - return -EFAULT; - - if ( unlikely(paging_mode_refcounts(pt_dom)) ) - { - if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) - return 0; - return -EBUSY; - } - - if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) - { - /* Translate foreign guest addresses. */ - struct page_info *page = NULL; - - if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) ) - { - MEM_LOG("Bad L1 flags %x", - l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); - return -EINVAL; - } - - if ( paging_mode_translate(pg_dom) ) - { - page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC); - if ( !page ) - return -EINVAL; - nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e)); - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l1e(nl1e, pt_dom); - rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad); - if ( page ) - put_page(page); - return rc ? 0 : -EBUSY; - } - - switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) - { - default: - if ( page ) - put_page(page); - return rc; - case 0: - break; - case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: - ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); - l1e_flip_flags(nl1e, rc); - rc = 0; - break; - } - if ( page ) - put_page(page); - - adjust_guest_l1e(nl1e, pt_dom); - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad)) ) - { - ol1e = nl1e; - rc = -EBUSY; - } - } - else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad)) ) - { - return -EBUSY; - } - - put_page_from_l1e(ol1e, pt_dom); - return rc; -} - - -/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */ -static int mod_l2_entry(l2_pgentry_t *pl2e, - l2_pgentry_t nl2e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - l2_pgentry_t ol2e; - struct domain *d = vcpu->domain; - struct page_info *l2pg = mfn_to_page(pfn); - unsigned long type = l2pg->u.inuse.type_info; - int rc = 0; - - if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) ) - { - MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e); - return -EPERM; - } - - if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) ) - return -EFAULT; - - if ( l2e_get_flags(nl2e) & _PAGE_PRESENT ) - { - if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) ) - { - MEM_LOG("Bad L2 flags %x", - l2e_get_flags(nl2e) & L2_DISALLOW_MASK); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l2e(nl2e, d); - if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) ) - return 0; - return -EBUSY; - } - - if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) ) - return rc; - - adjust_guest_l2e(nl2e, d); - if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, - preserve_ad)) ) - { - ol2e = nl2e; - rc = -EBUSY; - } - } - else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, - preserve_ad)) ) - { - return -EBUSY; - } - - put_page_from_l2e(ol2e, pfn); - return rc; -} - -/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */ -static int mod_l3_entry(l3_pgentry_t *pl3e, - l3_pgentry_t nl3e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - l3_pgentry_t ol3e; - struct domain *d = vcpu->domain; - int rc = 0; - - if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) ) - { - MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e); - return -EINVAL; - } - - /* - * Disallow updates to final L3 slot. It contains Xen mappings, and it - * would be a pain to ensure they remain continuously valid throughout. - */ - if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) - return -EINVAL; - - if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) ) - return -EFAULT; - - if ( l3e_get_flags(nl3e) & _PAGE_PRESENT ) - { - if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) ) - { - MEM_LOG("Bad L3 flags %x", - l3e_get_flags(nl3e) & l3_disallow_mask(d)); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l3e(nl3e, d); - rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad); - return rc ? 0 : -EFAULT; - } - - rc = get_page_from_l3e(nl3e, pfn, d, 0); - if ( unlikely(rc < 0) ) - return rc; - rc = 0; - - adjust_guest_l3e(nl3e, d); - if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, - preserve_ad)) ) - { - ol3e = nl3e; - rc = -EFAULT; - } - } - else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, - preserve_ad)) ) - { - return -EFAULT; - } - - if ( likely(rc == 0) ) - if ( !create_pae_xen_mappings(d, pl3e) ) - BUG(); - - put_page_from_l3e(ol3e, pfn, 0, 1); - return rc; -} - -/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */ -static int mod_l4_entry(l4_pgentry_t *pl4e, - l4_pgentry_t nl4e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - struct domain *d = vcpu->domain; - l4_pgentry_t ol4e; - int rc = 0; - - if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) ) - { - MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e); - return -EINVAL; - } - - if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) ) - return -EFAULT; - - if ( l4e_get_flags(nl4e) & _PAGE_PRESENT ) - { - if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) ) - { - MEM_LOG("Bad L4 flags %x", - l4e_get_flags(nl4e) & L4_DISALLOW_MASK); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l4e(nl4e, d); - rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad); - return rc ? 0 : -EFAULT; - } - - rc = get_page_from_l4e(nl4e, pfn, d, 0); - if ( unlikely(rc < 0) ) - return rc; - rc = 0; - - adjust_guest_l4e(nl4e, d); - if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, - preserve_ad)) ) - { - ol4e = nl4e; - rc = -EFAULT; - } - } - else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, - preserve_ad)) ) - { - return -EFAULT; - } - - put_page_from_l4e(ol4e, pfn, 0, 1); - return rc; -} - static int cleanup_page_cacheattr(struct page_info *page) { unsigned int cacheattr = @@ -2849,125 +2334,23 @@ int vcpu_destroy_pagetables(struct vcpu *v) return rc != -EINTR ? rc : -ERESTART; } -int new_guest_cr3(unsigned long mfn) +struct domain *mm_get_pg_owner(domid_t domid) { - struct vcpu *curr = current; - struct domain *d = curr->domain; - int rc; - unsigned long old_base_mfn; + struct domain *pg_owner = NULL, *curr = current->domain; - if ( is_pv_32bit_domain(d) ) + if ( likely(domid == DOMID_SELF) ) { - unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table); - l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn)); - - rc = paging_mode_refcounts(d) - ? -EINVAL /* Old code was broken, but what should it be? */ - : mod_l4_entry( - pl4e, - l4e_from_pfn( - mfn, - (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)), - gt_mfn, 0, curr); - unmap_domain_page(pl4e); - switch ( rc ) - { - case 0: - break; - case -EINTR: - case -ERESTART: - return -ERESTART; - default: - MEM_LOG("Error while installing new compat baseptr %lx", mfn); - return rc; - } - - invalidate_shadow_ldt(curr, 0); - write_ptbase(curr); - - return 0; + pg_owner = rcu_lock_current_domain(); + goto out; } - rc = put_old_guest_table(curr); - if ( unlikely(rc) ) - return rc; - - old_base_mfn = pagetable_get_pfn(curr->arch.guest_table); - /* - * This is particularly important when getting restarted after the - * previous attempt got preempted in the put-old-MFN phase. - */ - if ( old_base_mfn == mfn ) + if ( unlikely(domid == curr->domain_id) ) { - write_ptbase(curr); - return 0; + MEM_LOG("Cannot specify itself as foreign domain"); + goto out; } - rc = paging_mode_refcounts(d) - ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL) - : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1); - switch ( rc ) - { - case 0: - break; - case -EINTR: - case -ERESTART: - return -ERESTART; - default: - MEM_LOG("Error while installing new baseptr %lx", mfn); - return rc; - } - - invalidate_shadow_ldt(curr, 0); - - if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) - fill_ro_mpt(mfn); - curr->arch.guest_table = pagetable_from_pfn(mfn); - update_cr3(curr); - - write_ptbase(curr); - - if ( likely(old_base_mfn != 0) ) - { - struct page_info *page = mfn_to_page(old_base_mfn); - - if ( paging_mode_refcounts(d) ) - put_page(page); - else - switch ( rc = put_page_and_type_preemptible(page) ) - { - case -EINTR: - rc = -ERESTART; - /* fallthrough */ - case -ERESTART: - curr->arch.old_guest_table = page; - break; - default: - BUG_ON(rc); - break; - } - } - - return rc; -} - -struct domain *mm_get_pg_owner(domid_t domid) -{ - struct domain *pg_owner = NULL, *curr = current->domain; - - if ( likely(domid == DOMID_SELF) ) - { - pg_owner = rcu_lock_current_domain(); - goto out; - } - - if ( unlikely(domid == curr->domain_id) ) - { - MEM_LOG("Cannot specify itself as foreign domain"); - goto out; - } - - if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) ) + if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) ) { MEM_LOG("Cannot mix foreign mappings with translated domains"); goto out; @@ -3581,572 +2964,6 @@ long do_mmuext_op( return rc; } -long do_mmu_update( - XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs, - unsigned int count, - XEN_GUEST_HANDLE_PARAM(uint) pdone, - unsigned int foreigndom) -{ - struct mmu_update req; - void *va; - unsigned long gpfn, gmfn, mfn; - struct page_info *page; - unsigned int cmd, i = 0, done = 0, pt_dom; - struct vcpu *curr = current, *v = curr; - struct domain *d = v->domain, *pt_owner = d, *pg_owner; - struct domain_mmap_cache mapcache; - uint32_t xsm_needed = 0; - uint32_t xsm_checked = 0; - int rc = put_old_guest_table(curr); - - if ( unlikely(rc) ) - { - if ( likely(rc == -ERESTART) ) - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone, - foreigndom); - return rc; - } - - if ( unlikely(count == MMU_UPDATE_PREEMPTED) && - likely(guest_handle_is_null(ureqs)) ) - { - /* See the curr->arch.old_guest_table related - * hypercall_create_continuation() below. */ - return (int)foreigndom; - } - - if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) - { - count &= ~MMU_UPDATE_PREEMPTED; - if ( unlikely(!guest_handle_is_null(pdone)) ) - (void)copy_from_guest(&done, pdone, 1); - } - else - perfc_incr(calls_to_mmu_update); - - if ( unlikely(!guest_handle_okay(ureqs, count)) ) - return -EFAULT; - - if ( (pt_dom = foreigndom >> 16) != 0 ) - { - /* Pagetables belong to a foreign domain (PFD). */ - if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL ) - return -ESRCH; - - if ( pt_owner == d ) - rcu_unlock_domain(pt_owner); - else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL ) - { - rc = -EINVAL; - goto out; - } - } - - if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL ) - { - rc = -ESRCH; - goto out; - } - - domain_mmap_cache_init(&mapcache); - - for ( i = 0; i < count; i++ ) - { - if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) - { - rc = -ERESTART; - break; - } - - if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) ) - { - MEM_LOG("Bad __copy_from_guest"); - rc = -EFAULT; - break; - } - - cmd = req.ptr & (sizeof(l1_pgentry_t)-1); - - switch ( cmd ) - { - /* - * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table. - * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR) - * current A/D bits. - */ - case MMU_NORMAL_PT_UPDATE: - case MMU_PT_UPDATE_PRESERVE_AD: - { - p2m_type_t p2mt; - - rc = -EOPNOTSUPP; - if ( unlikely(paging_mode_refcounts(pt_owner)) ) - break; - - xsm_needed |= XSM_MMU_NORMAL_UPDATE; - if ( get_pte_flags(req.val) & _PAGE_PRESENT ) - { - xsm_needed |= XSM_MMU_UPDATE_READ; - if ( get_pte_flags(req.val) & _PAGE_RW ) - xsm_needed |= XSM_MMU_UPDATE_WRITE; - } - if ( xsm_needed != xsm_checked ) - { - rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed); - if ( rc ) - break; - xsm_checked = xsm_needed; - } - rc = -EINVAL; - - req.ptr -= cmd; - gmfn = req.ptr >> PAGE_SHIFT; - page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC); - - if ( p2m_is_paged(p2mt) ) - { - ASSERT(!page); - p2m_mem_paging_populate(pg_owner, gmfn); - rc = -ENOENT; - break; - } - - if ( unlikely(!page) ) - { - MEM_LOG("Could not get page for normal update"); - break; - } - - mfn = page_to_mfn(page); - va = map_domain_page_with_cache(mfn, &mapcache); - va = (void *)((unsigned long)va + - (unsigned long)(req.ptr & ~PAGE_MASK)); - - if ( page_lock(page) ) - { - switch ( page->u.inuse.type_info & PGT_type_mask ) - { - case PGT_l1_page_table: - { - l1_pgentry_t l1e = l1e_from_intpte(req.val); - p2m_type_t l1e_p2mt = p2m_ram_rw; - struct page_info *target = NULL; - p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ? - P2M_UNSHARE : P2M_ALLOC; - - if ( paging_mode_translate(pg_owner) ) - target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e), - &l1e_p2mt, q); - - if ( p2m_is_paged(l1e_p2mt) ) - { - if ( target ) - put_page(target); - p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in == l1e_p2mt && !target ) - { - rc = -ENOENT; - break; - } - /* If we tried to unshare and failed */ - else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) ) - { - /* We could not have obtained a page ref. */ - ASSERT(target == NULL); - /* And mem_sharing_notify has already been called. */ - rc = -ENOMEM; - break; - } - - rc = mod_l1_entry(va, l1e, mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v, - pg_owner); - if ( target ) - put_page(target); - } - break; - case PGT_l2_page_table: - rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_l3_page_table: - rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_l4_page_table: - rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_writable_page: - perfc_incr(writable_mmu_updates); - if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) - rc = 0; - break; - } - page_unlock(page); - if ( rc == -EINTR ) - rc = -ERESTART; - } - else if ( get_page_type(page, PGT_writable_page) ) - { - perfc_incr(writable_mmu_updates); - if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) - rc = 0; - put_page_type(page); - } - - unmap_domain_page_with_cache(va, &mapcache); - put_page(page); - } - break; - - case MMU_MACHPHYS_UPDATE: - if ( unlikely(d != pt_owner) ) - { - rc = -EPERM; - break; - } - - if ( unlikely(paging_mode_translate(pg_owner)) ) - { - rc = -EINVAL; - break; - } - - mfn = req.ptr >> PAGE_SHIFT; - gpfn = req.val; - - xsm_needed |= XSM_MMU_MACHPHYS_UPDATE; - if ( xsm_needed != xsm_checked ) - { - rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed); - if ( rc ) - break; - xsm_checked = xsm_needed; - } - - if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) ) - { - MEM_LOG("Could not get page for mach->phys update"); - rc = -EINVAL; - break; - } - - set_gpfn_from_mfn(mfn, gpfn); - - paging_mark_dirty(pg_owner, _mfn(mfn)); - - put_page(mfn_to_page(mfn)); - break; - - default: - MEM_LOG("Invalid page update command %x", cmd); - rc = -ENOSYS; - break; - } - - if ( unlikely(rc) ) - break; - - guest_handle_add_offset(ureqs, 1); - } - - if ( rc == -ERESTART ) - { - ASSERT(i < count); - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", - ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); - } - else if ( curr->arch.old_guest_table ) - { - XEN_GUEST_HANDLE_PARAM(void) null; - - ASSERT(rc || i == count); - set_xen_guest_handle(null, NULL); - /* - * In order to have a way to communicate the final return value to - * our continuation, we pass this in place of "foreigndom", building - * on the fact that this argument isn't needed anymore. - */ - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", null, - MMU_UPDATE_PREEMPTED, null, rc); - } - - mm_put_pg_owner(pg_owner); - - domain_mmap_cache_destroy(&mapcache); - - perfc_add(num_page_updates, i); - - out: - if ( pt_owner != d ) - rcu_unlock_domain(pt_owner); - - /* Add incremental work we have done to the @done output parameter. */ - if ( unlikely(!guest_handle_is_null(pdone)) ) - { - done += i; - copy_to_guest(pdone, &done, 1); - } - - return rc; -} - - -static int create_grant_pte_mapping( - uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v) -{ - int rc = GNTST_okay; - void *va; - unsigned long gmfn, mfn; - struct page_info *page; - l1_pgentry_t ol1e; - struct domain *d = v->domain; - - adjust_guest_l1e(nl1e, d); - - gmfn = pte_addr >> PAGE_SHIFT; - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); - - if ( unlikely(!page) ) - { - MEM_LOG("Could not get page for normal update"); - return GNTST_general_error; - } - - mfn = page_to_mfn(page); - va = map_domain_page(_mfn(mfn)); - va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK)); - - if ( !page_lock(page) ) - { - rc = GNTST_general_error; - goto failed; - } - - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(page); - rc = GNTST_general_error; - goto failed; - } - - ol1e = *(l1_pgentry_t *)va; - if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) ) - { - page_unlock(page); - rc = GNTST_general_error; - goto failed; - } - - page_unlock(page); - - if ( !paging_mode_refcounts(d) ) - put_page_from_l1e(ol1e, d); - - failed: - unmap_domain_page(va); - put_page(page); - - return rc; -} - -static int destroy_grant_pte_mapping( - uint64_t addr, unsigned long frame, struct domain *d) -{ - int rc = GNTST_okay; - void *va; - unsigned long gmfn, mfn; - struct page_info *page; - l1_pgentry_t ol1e; - - gmfn = addr >> PAGE_SHIFT; - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); - - if ( unlikely(!page) ) - { - MEM_LOG("Could not get page for normal update"); - return GNTST_general_error; - } - - mfn = page_to_mfn(page); - va = map_domain_page(_mfn(mfn)); - va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK)); - - if ( !page_lock(page) ) - { - rc = GNTST_general_error; - goto failed; - } - - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(page); - rc = GNTST_general_error; - goto failed; - } - - ol1e = *(l1_pgentry_t *)va; - - /* Check that the virtual address supplied is actually mapped to frame. */ - if ( unlikely(l1e_get_pfn(ol1e) != frame) ) - { - page_unlock(page); - MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx", - (unsigned long)l1e_get_intpte(ol1e), addr, frame); - rc = GNTST_general_error; - goto failed; - } - - /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY - (l1, - (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, - d->vcpu[0] /* Change if we go to per-vcpu shadows. */, - 0)) ) - { - page_unlock(page); - MEM_LOG("Cannot delete PTE entry at %p", va); - rc = GNTST_general_error; - goto failed; - } - - page_unlock(page); - - failed: - unmap_domain_page(va); - put_page(page); - return rc; -} - - -static int create_grant_va_mapping( - unsigned long va, l1_pgentry_t nl1e, struct vcpu *v) -{ - l1_pgentry_t *pl1e, ol1e; - struct domain *d = v->domain; - unsigned long gl1mfn; - struct page_info *l1pg; - int okay; - - adjust_guest_l1e(nl1e, d); - - pl1e = guest_map_l1e(va, &gl1mfn); - if ( !pl1e ) - { - MEM_LOG("Could not find L1 PTE for address %lx", va); - return GNTST_general_error; - } - - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) - { - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - ol1e = *pl1e; - okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0); - - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - - if ( okay && !paging_mode_refcounts(d) ) - put_page_from_l1e(ol1e, d); - - return okay ? GNTST_okay : GNTST_general_error; -} - -static int replace_grant_va_mapping( - unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v) -{ - l1_pgentry_t *pl1e, ol1e; - unsigned long gl1mfn; - struct page_info *l1pg; - int rc = 0; - - pl1e = guest_map_l1e(addr, &gl1mfn); - if ( !pl1e ) - { - MEM_LOG("Could not find L1 PTE for address %lx", addr); - return GNTST_general_error; - } - - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - rc = GNTST_general_error; - goto out; - } - - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) - { - rc = GNTST_general_error; - put_page(l1pg); - goto out; - } - - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - rc = GNTST_general_error; - goto unlock_and_out; - } - - ol1e = *pl1e; - - /* Check that the virtual address supplied is actually mapped to frame. */ - if ( unlikely(l1e_get_pfn(ol1e) != frame) ) - { - MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx", - l1e_get_pfn(ol1e), addr, frame); - rc = GNTST_general_error; - goto unlock_and_out; - } - - /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) ) - { - MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); - rc = GNTST_general_error; - goto unlock_and_out; - } - - unlock_and_out: - page_unlock(l1pg); - put_page(l1pg); - out: - guest_unmap_l1e(pl1e); - return rc; -} - -static int destroy_grant_va_mapping( - unsigned long addr, unsigned long frame, struct vcpu *v) -{ - return replace_grant_va_mapping(addr, frame, l1e_empty(), v); -} - static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame, unsigned int flags, unsigned int cache_flags) @@ -4170,140 +2987,38 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame, return GNTST_okay; } -static int create_grant_pv_mapping(uint64_t addr, unsigned long frame, - unsigned int flags, unsigned int cache_flags) -{ - l1_pgentry_t pte; - uint32_t grant_pte_flags; - - grant_pte_flags = - _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB; - if ( cpu_has_nx ) - grant_pte_flags |= _PAGE_NX_BIT; - - pte = l1e_from_pfn(frame, grant_pte_flags); - if ( (flags & GNTMAP_application_map) ) - l1e_add_flags(pte,_PAGE_USER); - if ( !(flags & GNTMAP_readonly) ) - l1e_add_flags(pte,_PAGE_RW); - - l1e_add_flags(pte, - ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0) - & _PAGE_AVAIL); - - l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5)); - - if ( flags & GNTMAP_contains_pte ) - return create_grant_pte_mapping(addr, pte, current); - return create_grant_va_mapping(addr, pte, current); -} - int create_grant_host_mapping(uint64_t addr, unsigned long frame, unsigned int flags, unsigned int cache_flags) { if ( paging_mode_external(current->domain) ) return create_grant_p2m_mapping(addr, frame, flags, cache_flags); - return create_grant_pv_mapping(addr, frame, flags, cache_flags); -} - -static int replace_grant_p2m_mapping( - uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags) -{ - unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT); - p2m_type_t type; - mfn_t old_mfn; - struct domain *d = current->domain; - - if ( new_addr != 0 || (flags & GNTMAP_contains_pte) ) - return GNTST_general_error; - - old_mfn = get_gfn(d, gfn, &type); - if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame ) - { - put_gfn(d, gfn); - MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)", - type, mfn_x(old_mfn), frame); - return GNTST_general_error; - } - guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K); - - put_gfn(d, gfn); - return GNTST_okay; -} - -static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, - uint64_t new_addr, unsigned int flags) -{ - struct vcpu *curr = current; - l1_pgentry_t *pl1e, ol1e; - unsigned long gl1mfn; - struct page_info *l1pg; - int rc; - - if ( flags & GNTMAP_contains_pte ) - { - if ( !new_addr ) - return destroy_grant_pte_mapping(addr, frame, curr->domain); - - MEM_LOG("Unsupported grant table operation"); - return GNTST_general_error; - } - - if ( !new_addr ) - return destroy_grant_va_mapping(addr, frame, curr); - - pl1e = guest_map_l1e(new_addr, &gl1mfn); - if ( !pl1e ) - { - MEM_LOG("Could not find L1 PTE for address %lx", - (unsigned long)new_addr); - return GNTST_general_error; - } - - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } + return create_grant_pv_mapping(addr, frame, flags, cache_flags); +} - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) - { - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } +static int replace_grant_p2m_mapping( + uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags) +{ + unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT); + p2m_type_t type; + mfn_t old_mfn; + struct domain *d = current->domain; - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); + if ( new_addr != 0 || (flags & GNTMAP_contains_pte) ) return GNTST_general_error; - } - - ol1e = *pl1e; - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), - gl1mfn, curr, 0)) ) + old_mfn = get_gfn(d, gfn, &type); + if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame ) { - page_unlock(l1pg); - put_page(l1pg); - MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); - guest_unmap_l1e(pl1e); + put_gfn(d, gfn); + MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)", + type, mfn_x(old_mfn), frame); return GNTST_general_error; } + guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K); - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - - rc = replace_grant_va_mapping(addr, frame, ol1e, curr); - if ( rc && !paging_mode_refcounts(curr->domain) ) - put_page_from_l1e(ol1e, curr->domain); - - return rc; + put_gfn(d, gfn); + return GNTST_okay; } int replace_grant_host_mapping(uint64_t addr, unsigned long frame, @@ -4405,125 +3120,6 @@ int steal_page( return -1; } -static int __do_update_va_mapping( - unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) -{ - l1_pgentry_t val = l1e_from_intpte(val64); - struct vcpu *v = current; - struct domain *d = v->domain; - struct page_info *gl1pg; - l1_pgentry_t *pl1e; - unsigned long bmap_ptr, gl1mfn; - cpumask_t *mask = NULL; - int rc; - - perfc_incr(calls_to_update_va); - - rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val); - if ( rc ) - return rc; - - rc = -EINVAL; - pl1e = guest_map_l1e(va, &gl1mfn); - if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) ) - goto out; - - gl1pg = mfn_to_page(gl1mfn); - if ( !page_lock(gl1pg) ) - { - put_page(gl1pg); - goto out; - } - - if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(gl1pg); - put_page(gl1pg); - goto out; - } - - rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner); - - page_unlock(gl1pg); - put_page(gl1pg); - - out: - if ( pl1e ) - guest_unmap_l1e(pl1e); - - switch ( flags & UVMF_FLUSHTYPE_MASK ) - { - case UVMF_TLB_FLUSH: - switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) - { - case UVMF_LOCAL: - flush_tlb_local(); - break; - case UVMF_ALL: - mask = d->domain_dirty_cpumask; - break; - default: - mask = this_cpu(scratch_cpumask); - rc = mm_vcpumask_to_pcpumask(d, - const_guest_handle_from_ptr(bmap_ptr, - void), - mask); - break; - } - if ( mask ) - flush_tlb_mask(mask); - break; - - case UVMF_INVLPG: - switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) - { - case UVMF_LOCAL: - paging_invlpg(v, va); - break; - case UVMF_ALL: - mask = d->domain_dirty_cpumask; - break; - default: - mask = this_cpu(scratch_cpumask); - rc = mm_vcpumask_to_pcpumask(d, - const_guest_handle_from_ptr(bmap_ptr, - void), - mask); - break; - } - if ( mask ) - flush_tlb_one_mask(mask, va); - break; - } - - return rc; -} - -long do_update_va_mapping(unsigned long va, u64 val64, - unsigned long flags) -{ - return __do_update_va_mapping(va, val64, flags, current->domain); -} - -long do_update_va_mapping_otherdomain(unsigned long va, u64 val64, - unsigned long flags, - domid_t domid) -{ - struct domain *pg_owner; - int rc; - - if ( (pg_owner = mm_get_pg_owner(domid)) == NULL ) - return -ESRCH; - - rc = __do_update_va_mapping(va, val64, flags, pg_owner); - - mm_put_pg_owner(pg_owner); - - return rc; -} - - - /************************* * Descriptor Tables */ @@ -5084,466 +3680,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return 0; } - -/************************* - * Writable Pagetables - */ - -struct ptwr_emulate_ctxt { - struct x86_emulate_ctxt ctxt; - unsigned long cr2; - l1_pgentry_t pte; -}; - -static int ptwr_emulated_read( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - unsigned int rc = bytes; - unsigned long addr = offset; - - if ( !__addr_ok(addr) || - (rc = __copy_from_user(p_data, (void *)addr, bytes)) ) - { - x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */ - return X86EMUL_EXCEPTION; - } - - return X86EMUL_OKAY; -} - -static int ptwr_emulated_update( - unsigned long addr, - paddr_t old, - paddr_t val, - unsigned int bytes, - unsigned int do_cmpxchg, - struct ptwr_emulate_ctxt *ptwr_ctxt) -{ - unsigned long mfn; - unsigned long unaligned_addr = addr; - struct page_info *page; - l1_pgentry_t pte, ol1e, nl1e, *pl1e; - struct vcpu *v = current; - struct domain *d = v->domain; - int ret; - - /* Only allow naturally-aligned stores within the original %cr2 page. */ - if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) ) - { - MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", - ptwr_ctxt->cr2, addr, bytes); - return X86EMUL_UNHANDLEABLE; - } - - /* Turn a sub-word access into a full-word access. */ - if ( bytes != sizeof(paddr_t) ) - { - paddr_t full; - unsigned int rc, offset = addr & (sizeof(paddr_t)-1); - - /* Align address; read full word. */ - addr &= ~(sizeof(paddr_t)-1); - if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 ) - { - x86_emul_pagefault(0, /* Read fault. */ - addr + sizeof(paddr_t) - rc, - &ptwr_ctxt->ctxt); - return X86EMUL_EXCEPTION; - } - /* Mask out bits provided by caller. */ - full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8)); - /* Shift the caller value and OR in the missing bits. */ - val &= (((paddr_t)1 << (bytes*8)) - 1); - val <<= (offset)*8; - val |= full; - /* Also fill in missing parts of the cmpxchg old value. */ - old &= (((paddr_t)1 << (bytes*8)) - 1); - old <<= (offset)*8; - old |= full; - } - - pte = ptwr_ctxt->pte; - mfn = l1e_get_pfn(pte); - page = mfn_to_page(mfn); - - /* We are looking only for read-only mappings of p.t. pages. */ - ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT); - ASSERT(mfn_valid(_mfn(mfn))); - ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table); - ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0); - ASSERT(page_get_owner(page) == d); - - /* Check the new PTE. */ - nl1e = l1e_from_intpte(val); - switch ( ret = get_page_from_l1e(nl1e, d, d) ) - { - default: - if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) && - !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) - { - /* - * If this is an upper-half write to a PAE PTE then we assume that - * the guest has simply got the two writes the wrong way round. We - * zap the PRESENT bit on the assumption that the bottom half will - * be written immediately after we return to the guest. - */ - gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %" - PRIpte"\n", l1e_get_intpte(nl1e)); - l1e_remove_flags(nl1e, _PAGE_PRESENT); - } - else - { - MEM_LOG("ptwr_emulate: could not get_page_from_l1e()"); - return X86EMUL_UNHANDLEABLE; - } - break; - case 0: - break; - case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: - ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); - l1e_flip_flags(nl1e, ret); - break; - } - - adjust_guest_l1e(nl1e, d); - - /* Checked successfully: do the update (write or cmpxchg). */ - pl1e = map_domain_page(_mfn(mfn)); - pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK)); - if ( do_cmpxchg ) - { - int okay; - intpte_t t = old; - ol1e = l1e_from_intpte(old); - - okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), - &t, l1e_get_intpte(nl1e), _mfn(mfn)); - okay = (okay && t == old); - - if ( !okay ) - { - unmap_domain_page(pl1e); - put_page_from_l1e(nl1e, d); - return X86EMUL_RETRY; - } - } - else - { - ol1e = *pl1e; - if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) ) - BUG(); - } - - trace_ptwr_emulation(addr, nl1e); - - unmap_domain_page(pl1e); - - /* Finally, drop the old PTE. */ - put_page_from_l1e(ol1e, d); - - return X86EMUL_OKAY; -} - -static int ptwr_emulated_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - paddr_t val = 0; - - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes ) - { - MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)", - offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - memcpy(&val, p_data, bytes); - - return ptwr_emulated_update( - offset, 0, val, bytes, 0, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); -} - -static int ptwr_emulated_cmpxchg( - enum x86_segment seg, - unsigned long offset, - void *p_old, - void *p_new, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - paddr_t old = 0, new = 0; - - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) ) - { - MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)", - offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - memcpy(&old, p_old, bytes); - memcpy(&new, p_new, bytes); - - return ptwr_emulated_update( - offset, old, new, bytes, 1, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); -} - -static int pv_emul_is_mem_write(const struct x86_emulate_state *state, - struct x86_emulate_ctxt *ctxt) -{ - return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY - : X86EMUL_UNHANDLEABLE; -} - -static const struct x86_emulate_ops ptwr_emulate_ops = { - .read = ptwr_emulated_read, - .insn_fetch = ptwr_emulated_read, - .write = ptwr_emulated_write, - .cmpxchg = ptwr_emulated_cmpxchg, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -/* Write page fault handler: check if guest is trying to modify a PTE. */ -int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, - struct cpu_user_regs *regs) -{ - struct domain *d = v->domain; - struct page_info *page; - l1_pgentry_t pte; - struct ptwr_emulate_ctxt ptwr_ctxt = { - .ctxt = { - .regs = regs, - .vendor = d->arch.cpuid->x86_vendor, - .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, - .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, - .swint_emulate = x86_swint_emulate_none, - }, - }; - int rc; - - /* Attempt to read the PTE that maps the VA being accessed. */ - guest_get_eff_l1e(addr, &pte); - - /* We are looking only for read-only mappings of p.t. pages. */ - if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) || - rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) || - !get_page_from_pagenr(l1e_get_pfn(pte), d) ) - goto bail; - - page = l1e_get_page(pte); - if ( !page_lock(page) ) - { - put_page(page); - goto bail; - } - - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(page); - put_page(page); - goto bail; - } - - ptwr_ctxt.cr2 = addr; - ptwr_ctxt.pte = pte; - - rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops); - - page_unlock(page); - put_page(page); - - switch ( rc ) - { - case X86EMUL_EXCEPTION: - /* - * This emulation only covers writes to pagetables which are marked - * read-only by Xen. We tolerate #PF (in case a concurrent pagetable - * update has succeeded on a different vcpu). Anything else is an - * emulation bug, or a guest playing with the instruction stream under - * Xen's feet. - */ - if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && - ptwr_ctxt.ctxt.event.vector == TRAP_page_fault ) - pv_inject_event(&ptwr_ctxt.ctxt.event); - else - gdprintk(XENLOG_WARNING, - "Unexpected event (type %u, vector %#x) from emulation\n", - ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector); - - /* Fallthrough */ - case X86EMUL_OKAY: - - if ( ptwr_ctxt.ctxt.retire.singlestep ) - pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); - - /* Fallthrough */ - case X86EMUL_RETRY: - perfc_incr(ptwr_emulations); - return EXCRET_fault_fixed; - } - - bail: - return 0; -} - -/************************* - * fault handling for read-only MMIO pages - */ - -int mmio_ro_emulated_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data; - - /* Only allow naturally-aligned stores at the original %cr2 address. */ - if ( ((bytes | offset) & (bytes - 1)) || !bytes || - offset != mmio_ro_ctxt->cr2 ) - { - MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", - mmio_ro_ctxt->cr2, offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - return X86EMUL_OKAY; -} - -static const struct x86_emulate_ops mmio_ro_emulate_ops = { - .read = x86emul_unhandleable_rw, - .insn_fetch = ptwr_emulated_read, - .write = mmio_ro_emulated_write, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -int mmcfg_intercept_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data; - - /* - * Only allow naturally-aligned stores no wider than 4 bytes to the - * original %cr2 address. - */ - if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes || - offset != mmio_ctxt->cr2 ) - { - MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)", - mmio_ctxt->cr2, offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - offset &= 0xfff; - if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf, - offset, bytes, p_data) >= 0 ) - pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf), - PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes, - *(uint32_t *)p_data); - - return X86EMUL_OKAY; -} - -static const struct x86_emulate_ops mmcfg_intercept_ops = { - .read = x86emul_unhandleable_rw, - .insn_fetch = ptwr_emulated_read, - .write = mmcfg_intercept_write, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -/* Check if guest is trying to modify a r/o MMIO page. */ -int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, - struct cpu_user_regs *regs) -{ - l1_pgentry_t pte; - unsigned long mfn; - unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG; - struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr }; - struct x86_emulate_ctxt ctxt = { - .regs = regs, - .vendor = v->domain->arch.cpuid->x86_vendor, - .addr_size = addr_size, - .sp_size = addr_size, - .swint_emulate = x86_swint_emulate_none, - .data = &mmio_ro_ctxt - }; - int rc; - - /* Attempt to read the PTE that maps the VA being accessed. */ - guest_get_eff_l1e(addr, &pte); - - /* We are looking only for read-only mappings of MMIO pages. */ - if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) - return 0; - - mfn = l1e_get_pfn(pte); - if ( mfn_valid(_mfn(mfn)) ) - { - struct page_info *page = mfn_to_page(mfn); - struct domain *owner = page_get_owner_and_reference(page); - - if ( owner ) - put_page(page); - if ( owner != dom_io ) - return 0; - } - - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) - return 0; - - if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) ) - rc = x86_emulate(&ctxt, &mmcfg_intercept_ops); - else - rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops); - - switch ( rc ) - { - case X86EMUL_EXCEPTION: - /* - * This emulation only covers writes to MMCFG space or read-only MFNs. - * We tolerate #PF (from hitting an adjacent page or a successful - * concurrent pagetable update). Anything else is an emulation bug, - * or a guest playing with the instruction stream under Xen's feet. - */ - if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && - ctxt.event.vector == TRAP_page_fault ) - pv_inject_event(&ctxt.event); - else - gdprintk(XENLOG_WARNING, - "Unexpected event (type %u, vector %#x) from emulation\n", - ctxt.event.type, ctxt.event.vector); - - /* Fallthrough */ - case X86EMUL_OKAY: - - if ( ctxt.retire.singlestep ) - pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); - - /* Fallthrough */ - case X86EMUL_RETRY: - perfc_incr(ptwr_emulations); - return EXCRET_fault_fixed; - } - - return 0; -} - void *alloc_xen_pagetable(void) { if ( system_state != SYS_STATE_early_boot ) diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile index ea94599438..665be5536c 100644 --- a/xen/arch/x86/pv/Makefile +++ b/xen/arch/x86/pv/Makefile @@ -1,2 +1,3 @@ obj-y += hypercall.o obj-bin-y += dom0_build.init.o +obj-y += mm.o diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c new file mode 100644 index 0000000000..fd157b9f58 --- /dev/null +++ b/xen/arch/x86/pv/mm.c @@ -0,0 +1,1902 @@ +/****************************************************************************** + * arch/x86/pv/mm.c + * + * Copyright (c) 2002-2005 K A Fraser + * Copyright (c) 2004 Christian Limpach + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; If not, see <http://www.gnu.org/licenses/>. + */ + +/* + * A description of the x86 page table API: + * + * Domains trap to do_mmu_update with a list of update requests. + * This is a list of (ptr, val) pairs, where the requested operation + * is *ptr = val. + * + * Reference counting of pages: + * ---------------------------- + * Each page has two refcounts: tot_count and type_count. + * + * TOT_COUNT is the obvious reference count. It counts all uses of a + * physical page frame by a domain, including uses as a page directory, + * a page table, or simple mappings via a PTE. This count prevents a + * domain from releasing a frame back to the free pool when it still holds + * a reference to it. + * + * TYPE_COUNT is more subtle. A frame can be put to one of three + * mutually-exclusive uses: it might be used as a page directory, or a + * page table, or it may be mapped writable by the domain [of course, a + * frame may not be used in any of these three ways!]. + * So, type_count is a count of the number of times a frame is being + * referred to in its current incarnation. Therefore, a page can only + * change its type when its type count is zero. + * + * Pinning the page type: + * ---------------------- + * The type of a page can be pinned/unpinned with the commands + * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is, + * pinning is not reference counted, so it can't be nested). + * This is useful to prevent a page's type count falling to zero, at which + * point safety checks would need to be carried out next time the count + * is increased again. + * + * A further note on writable page mappings: + * ----------------------------------------- + * For simplicity, the count of writable mappings for a page may not + * correspond to reality. The 'writable count' is incremented for every + * PTE which maps the page with the _PAGE_RW flag set. However, for + * write access to be possible the page directory entry must also have + * its _PAGE_RW bit set. We do not check this as it complicates the + * reference counting considerably [consider the case of multiple + * directory entries referencing a single page table, some with the RW + * bit set, others not -- it starts getting a bit messy]. + * In normal use, this simplification shouldn't be a problem. + * However, the logic can be added if required. + * + * One more note on read-only page mappings: + * ----------------------------------------- + * We want domains to be able to map pages for read-only access. The + * main reason is that page tables and directories should be readable + * by a domain, but it would not be safe for them to be writable. + * However, domains have free access to rings 1 & 2 of the Intel + * privilege model. In terms of page protection, these are considered + * to be part of 'supervisor mode'. The WP bit in CR0 controls whether + * read-only restrictions are respected in supervisor mode -- if the + * bit is clear then any mapped page is writable. + * + * We get round this by always setting the WP bit and disallowing + * updates to it. This is very unlikely to cause a problem for guest + * OS's, which will generally use the WP bit to simplify copy-on-write + * implementation (in that case, OS wants a fault when it writes to + * an application-supplied buffer). + */ + +#include <xen/event.h> +#include <xen/guest_access.h> +#include <xen/hypercall.h> +#include <xen/mm.h> +#include <xen/sched.h> +#include <xen/trace.h> +#include <xsm/xsm.h> + +#include <asm/p2m.h> +#include <asm/paging.h> +#include <asm/x86_emulate.h> + +/* + * PTE updates can be done with ordinary writes except: + * 1. Debug builds get extra checking by using CMPXCHG[8B]. + */ +#if !defined(NDEBUG) +#define PTE_UPDATE_WITH_CMPXCHG +#endif + +/* How to write an entry to the guest pagetables. + * Returns 0 for failure (pointer not valid), 1 for success. */ +static inline int update_intpte(intpte_t *p, + intpte_t old, + intpte_t new, + unsigned long mfn, + struct vcpu *v, + int preserve_ad) +{ + int rv = 1; +#ifndef PTE_UPDATE_WITH_CMPXCHG + if ( !preserve_ad ) + { + rv = paging_write_guest_entry(v, p, new, _mfn(mfn)); + } + else +#endif + { + intpte_t t = old; + for ( ; ; ) + { + intpte_t _new = new; + if ( preserve_ad ) + _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY); + + rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn)); + if ( unlikely(rv == 0) ) + { + MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte + ": saw %" PRIpte, old, _new, t); + break; + } + + if ( t == old ) + break; + + /* Allowed to change in Accessed/Dirty flags only. */ + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); + + old = t; + } + } + return rv; +} + +/* Macro that wraps the appropriate type-changes around update_intpte(). + * Arguments are: type, ptr, old, new, mfn, vcpu */ +#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \ + update_intpte(&_t ## e_get_intpte(*(_p)), \ + _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \ + (_m), (_v), (_ad)) + +/* + * PTE flags that a guest may change without re-validating the PTE. + * All other bits affect translation, caching, or Xen's safety. + */ +#define FASTPATH_FLAG_WHITELIST \ + (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \ + _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER) + +/* Update the L1 entry at pl1e to new value nl1e. */ +static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, + unsigned long gl1mfn, int preserve_ad, + struct vcpu *pt_vcpu, struct domain *pg_dom) +{ + l1_pgentry_t ol1e; + struct domain *pt_dom = pt_vcpu->domain; + int rc = 0; + + if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) + return -EFAULT; + + if ( unlikely(paging_mode_refcounts(pt_dom)) ) + { + if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) + return 0; + return -EBUSY; + } + + if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) + { + /* Translate foreign guest addresses. */ + struct page_info *page = NULL; + + if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) ) + { + MEM_LOG("Bad L1 flags %x", + l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); + return -EINVAL; + } + + if ( paging_mode_translate(pg_dom) ) + { + page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC); + if ( !page ) + return -EINVAL; + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e)); + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l1e(nl1e, pt_dom); + rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad); + if ( page ) + put_page(page); + return rc ? 0 : -EBUSY; + } + + switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) + { + default: + if ( page ) + put_page(page); + return rc; + case 0: + break; + case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: + ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); + l1e_flip_flags(nl1e, rc); + rc = 0; + break; + } + if ( page ) + put_page(page); + + adjust_guest_l1e(nl1e, pt_dom); + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad)) ) + { + ol1e = nl1e; + rc = -EBUSY; + } + } + else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad)) ) + { + return -EBUSY; + } + + put_page_from_l1e(ol1e, pt_dom); + return rc; +} + + +/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */ +static int mod_l2_entry(l2_pgentry_t *pl2e, + l2_pgentry_t nl2e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + l2_pgentry_t ol2e; + struct domain *d = vcpu->domain; + struct page_info *l2pg = mfn_to_page(pfn); + unsigned long type = l2pg->u.inuse.type_info; + int rc = 0; + + if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) ) + { + MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e); + return -EPERM; + } + + if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) ) + return -EFAULT; + + if ( l2e_get_flags(nl2e) & _PAGE_PRESENT ) + { + if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) ) + { + MEM_LOG("Bad L2 flags %x", + l2e_get_flags(nl2e) & L2_DISALLOW_MASK); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l2e(nl2e, d); + if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) ) + return 0; + return -EBUSY; + } + + if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) ) + return rc; + + adjust_guest_l2e(nl2e, d); + if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, + preserve_ad)) ) + { + ol2e = nl2e; + rc = -EBUSY; + } + } + else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, + preserve_ad)) ) + { + return -EBUSY; + } + + put_page_from_l2e(ol2e, pfn); + return rc; +} + +/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */ +static int mod_l3_entry(l3_pgentry_t *pl3e, + l3_pgentry_t nl3e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + l3_pgentry_t ol3e; + struct domain *d = vcpu->domain; + int rc = 0; + + if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) ) + { + MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e); + return -EINVAL; + } + + /* + * Disallow updates to final L3 slot. It contains Xen mappings, and it + * would be a pain to ensure they remain continuously valid throughout. + */ + if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) + return -EINVAL; + + if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) ) + return -EFAULT; + + if ( l3e_get_flags(nl3e) & _PAGE_PRESENT ) + { + if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) ) + { + MEM_LOG("Bad L3 flags %x", + l3e_get_flags(nl3e) & l3_disallow_mask(d)); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l3e(nl3e, d); + rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad); + return rc ? 0 : -EFAULT; + } + + rc = get_page_from_l3e(nl3e, pfn, d, 0); + if ( unlikely(rc < 0) ) + return rc; + rc = 0; + + adjust_guest_l3e(nl3e, d); + if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, + preserve_ad)) ) + { + ol3e = nl3e; + rc = -EFAULT; + } + } + else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, + preserve_ad)) ) + { + return -EFAULT; + } + + if ( likely(rc == 0) ) + if ( !create_pae_xen_mappings(d, pl3e) ) + BUG(); + + put_page_from_l3e(ol3e, pfn, 0, 1); + return rc; +} + +/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */ +static int mod_l4_entry(l4_pgentry_t *pl4e, + l4_pgentry_t nl4e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + struct domain *d = vcpu->domain; + l4_pgentry_t ol4e; + int rc = 0; + + if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) ) + { + MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e); + return -EINVAL; + } + + if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) ) + return -EFAULT; + + if ( l4e_get_flags(nl4e) & _PAGE_PRESENT ) + { + if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) ) + { + MEM_LOG("Bad L4 flags %x", + l4e_get_flags(nl4e) & L4_DISALLOW_MASK); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l4e(nl4e, d); + rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad); + return rc ? 0 : -EFAULT; + } + + rc = get_page_from_l4e(nl4e, pfn, d, 0); + if ( unlikely(rc < 0) ) + return rc; + rc = 0; + + adjust_guest_l4e(nl4e, d); + if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, + preserve_ad)) ) + { + ol4e = nl4e; + rc = -EFAULT; + } + } + else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, + preserve_ad)) ) + { + return -EFAULT; + } + + put_page_from_l4e(ol4e, pfn, 0, 1); + return rc; +} + +/* Get a mapping of a PV guest's l1e for this virtual address. */ +static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn) +{ + l2_pgentry_t l2e; + + ASSERT(!paging_mode_translate(current->domain)); + ASSERT(!paging_mode_external(current->domain)); + + if ( unlikely(!__addr_ok(addr)) ) + return NULL; + + /* Find this l1e and its enclosing l1mfn in the linear map. */ + if ( __copy_from_user(&l2e, + &__linear_l2_table[l2_linear_offset(addr)], + sizeof(l2_pgentry_t)) ) + return NULL; + + /* Check flags that it will be safe to read the l1e. */ + if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT ) + return NULL; + + *gl1mfn = l2e_get_pfn(l2e); + + return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) + + l1_table_offset(addr); +} + +/* Pull down the mapping we got from guest_map_l1e(). */ +static inline void guest_unmap_l1e(void *p) +{ + unmap_domain_page(p); +} + +static int create_grant_va_mapping( + unsigned long va, l1_pgentry_t nl1e, struct vcpu *v) +{ + l1_pgentry_t *pl1e, ol1e; + struct domain *d = v->domain; + unsigned long gl1mfn; + struct page_info *l1pg; + int okay; + + adjust_guest_l1e(nl1e, d); + + pl1e = guest_map_l1e(va, &gl1mfn); + if ( !pl1e ) + { + MEM_LOG("Could not find L1 PTE for address %lx", va); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + ol1e = *pl1e; + okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0); + + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + + if ( okay && !paging_mode_refcounts(d) ) + put_page_from_l1e(ol1e, d); + + return okay ? GNTST_okay : GNTST_general_error; +} + +static int replace_grant_va_mapping( + unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v) +{ + l1_pgentry_t *pl1e, ol1e; + unsigned long gl1mfn; + struct page_info *l1pg; + int rc = 0; + + pl1e = guest_map_l1e(addr, &gl1mfn); + if ( !pl1e ) + { + MEM_LOG("Could not find L1 PTE for address %lx", addr); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + rc = GNTST_general_error; + goto out; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + rc = GNTST_general_error; + put_page(l1pg); + goto out; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + rc = GNTST_general_error; + goto unlock_and_out; + } + + ol1e = *pl1e; + + /* Check that the virtual address supplied is actually mapped to frame. */ + if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + { + MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx", + l1e_get_pfn(ol1e), addr, frame); + rc = GNTST_general_error; + goto unlock_and_out; + } + + /* Delete pagetable entry. */ + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) ) + { + MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); + rc = GNTST_general_error; + goto unlock_and_out; + } + + unlock_and_out: + page_unlock(l1pg); + put_page(l1pg); + out: + guest_unmap_l1e(pl1e); + return rc; +} + +static int create_grant_pte_mapping( + uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v) +{ + int rc = GNTST_okay; + void *va; + unsigned long gmfn, mfn; + struct page_info *page; + l1_pgentry_t ol1e; + struct domain *d = v->domain; + + adjust_guest_l1e(nl1e, d); + + gmfn = pte_addr >> PAGE_SHIFT; + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + + if ( unlikely(!page) ) + { + MEM_LOG("Could not get page for normal update"); + return GNTST_general_error; + } + + mfn = page_to_mfn(page); + va = map_domain_page(_mfn(mfn)); + va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK)); + + if ( !page_lock(page) ) + { + rc = GNTST_general_error; + goto failed; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + ol1e = *(l1_pgentry_t *)va; + if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + page_unlock(page); + + if ( !paging_mode_refcounts(d) ) + put_page_from_l1e(ol1e, d); + + failed: + unmap_domain_page(va); + put_page(page); + + return rc; +} + +static int destroy_grant_pte_mapping( + uint64_t addr, unsigned long frame, struct domain *d) +{ + int rc = GNTST_okay; + void *va; + unsigned long gmfn, mfn; + struct page_info *page; + l1_pgentry_t ol1e; + + gmfn = addr >> PAGE_SHIFT; + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + + if ( unlikely(!page) ) + { + MEM_LOG("Could not get page for normal update"); + return GNTST_general_error; + } + + mfn = page_to_mfn(page); + va = map_domain_page(_mfn(mfn)); + va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK)); + + if ( !page_lock(page) ) + { + rc = GNTST_general_error; + goto failed; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + ol1e = *(l1_pgentry_t *)va; + + /* Check that the virtual address supplied is actually mapped to frame. */ + if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + { + page_unlock(page); + MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx", + (unsigned long)l1e_get_intpte(ol1e), addr, frame); + rc = GNTST_general_error; + goto failed; + } + + /* Delete pagetable entry. */ + if ( unlikely(!UPDATE_ENTRY + (l1, + (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, + d->vcpu[0] /* Change if we go to per-vcpu shadows. */, + 0)) ) + { + page_unlock(page); + MEM_LOG("Cannot delete PTE entry at %p", va); + rc = GNTST_general_error; + goto failed; + } + + page_unlock(page); + + failed: + unmap_domain_page(va); + put_page(page); + return rc; +} + +/* Read a PV guest's l1e that maps this virtual address. */ +static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e) +{ + ASSERT(!paging_mode_translate(current->domain)); + ASSERT(!paging_mode_external(current->domain)); + + if ( unlikely(!__addr_ok(addr)) || + __copy_from_user(eff_l1e, + &__linear_l1_table[l1_linear_offset(addr)], + sizeof(l1_pgentry_t)) ) + *eff_l1e = l1e_empty(); +} + +/* + * Read the guest's l1e that maps this address, from the kernel-mode + * page tables. + */ +static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr, + void *eff_l1e) +{ + bool_t user_mode = !(v->arch.flags & TF_kernel_mode); +#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) + + TOGGLE_MODE(); + guest_get_eff_l1e(addr, eff_l1e); + TOGGLE_MODE(); +} + +static int destroy_grant_va_mapping( + unsigned long addr, unsigned long frame, struct vcpu *v) +{ + return replace_grant_va_mapping(addr, frame, l1e_empty(), v); +} + +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, + uint64_t new_addr, unsigned int flags) +{ + struct vcpu *curr = current; + l1_pgentry_t *pl1e, ol1e; + unsigned long gl1mfn; + struct page_info *l1pg; + int rc; + + if ( flags & GNTMAP_contains_pte ) + { + if ( !new_addr ) + return destroy_grant_pte_mapping(addr, frame, curr->domain); + + MEM_LOG("Unsupported grant table operation"); + return GNTST_general_error; + } + + if ( !new_addr ) + return destroy_grant_va_mapping(addr, frame, curr); + + pl1e = guest_map_l1e(new_addr, &gl1mfn); + if ( !pl1e ) + { + MEM_LOG("Could not find L1 PTE for address %lx", + (unsigned long)new_addr); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + ol1e = *pl1e; + + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), + gl1mfn, curr, 0)) ) + { + page_unlock(l1pg); + put_page(l1pg); + MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + + rc = replace_grant_va_mapping(addr, frame, ol1e, curr); + if ( rc && !paging_mode_refcounts(curr->domain) ) + put_page_from_l1e(ol1e, curr->domain); + + return rc; +} + +int create_grant_pv_mapping(uint64_t addr, unsigned long frame, + unsigned int flags, unsigned int cache_flags) +{ + l1_pgentry_t pte; + uint32_t grant_pte_flags; + + grant_pte_flags = + _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB; + if ( cpu_has_nx ) + grant_pte_flags |= _PAGE_NX_BIT; + + pte = l1e_from_pfn(frame, grant_pte_flags); + if ( (flags & GNTMAP_application_map) ) + l1e_add_flags(pte,_PAGE_USER); + if ( !(flags & GNTMAP_readonly) ) + l1e_add_flags(pte,_PAGE_RW); + + l1e_add_flags(pte, + ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0) + & _PAGE_AVAIL); + + l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5)); + + if ( flags & GNTMAP_contains_pte ) + return create_grant_pte_mapping(addr, pte, current); + return create_grant_va_mapping(addr, pte, current); +} + +/* Map shadow page at offset @off. */ +int map_ldt_shadow_page(unsigned int off) +{ + struct vcpu *v = current; + struct domain *d = v->domain; + unsigned long gmfn; + struct page_info *page; + l1_pgentry_t l1e, nl1e; + unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT); + int okay; + + BUG_ON(unlikely(in_irq())); + + if ( is_pv_32bit_domain(d) ) + gva = (u32)gva; + guest_get_eff_kern_l1e(v, gva, &l1e); + if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) ) + return 0; + + gmfn = l1e_get_pfn(l1e); + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + return 0; + + okay = get_page_type(page, PGT_seg_desc_page); + if ( unlikely(!okay) ) + { + put_page(page); + return 0; + } + + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW); + + spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); + l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e); + v->arch.pv_vcpu.shadow_ldt_mapcnt++; + spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); + + return 1; +} + +int new_guest_cr3(unsigned long mfn) +{ + struct vcpu *curr = current; + struct domain *d = curr->domain; + int rc; + unsigned long old_base_mfn; + + if ( is_pv_32bit_domain(d) ) + { + unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table); + l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn)); + + rc = paging_mode_refcounts(d) + ? -EINVAL /* Old code was broken, but what should it be? */ + : mod_l4_entry( + pl4e, + l4e_from_pfn( + mfn, + (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)), + gt_mfn, 0, curr); + unmap_domain_page(pl4e); + switch ( rc ) + { + case 0: + break; + case -EINTR: + case -ERESTART: + return -ERESTART; + default: + MEM_LOG("Error while installing new compat baseptr %lx", mfn); + return rc; + } + + invalidate_shadow_ldt(curr, 0); + write_ptbase(curr); + + return 0; + } + + rc = put_old_guest_table(curr); + if ( unlikely(rc) ) + return rc; + + old_base_mfn = pagetable_get_pfn(curr->arch.guest_table); + /* + * This is particularly important when getting restarted after the + * previous attempt got preempted in the put-old-MFN phase. + */ + if ( old_base_mfn == mfn ) + { + write_ptbase(curr); + return 0; + } + + rc = paging_mode_refcounts(d) + ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL) + : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1); + switch ( rc ) + { + case 0: + break; + case -EINTR: + case -ERESTART: + return -ERESTART; + default: + MEM_LOG("Error while installing new baseptr %lx", mfn); + return rc; + } + + invalidate_shadow_ldt(curr, 0); + + if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) + fill_ro_mpt(mfn); + curr->arch.guest_table = pagetable_from_pfn(mfn); + update_cr3(curr); + + write_ptbase(curr); + + if ( likely(old_base_mfn != 0) ) + { + struct page_info *page = mfn_to_page(old_base_mfn); + + if ( paging_mode_refcounts(d) ) + put_page(page); + else + switch ( rc = put_page_and_type_preemptible(page) ) + { + case -EINTR: + rc = -ERESTART; + /* fallthrough */ + case -ERESTART: + curr->arch.old_guest_table = page; + break; + default: + BUG_ON(rc); + break; + } + } + + return rc; +} + +/************************* + * Writable Pagetables + */ + +struct ptwr_emulate_ctxt { + struct x86_emulate_ctxt ctxt; + unsigned long cr2; + l1_pgentry_t pte; +}; + +static int ptwr_emulated_read( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + unsigned int rc = bytes; + unsigned long addr = offset; + + if ( !__addr_ok(addr) || + (rc = __copy_from_user(p_data, (void *)addr, bytes)) ) + { + x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */ + return X86EMUL_EXCEPTION; + } + + return X86EMUL_OKAY; +} + +static int ptwr_emulated_update( + unsigned long addr, + paddr_t old, + paddr_t val, + unsigned int bytes, + unsigned int do_cmpxchg, + struct ptwr_emulate_ctxt *ptwr_ctxt) +{ + unsigned long mfn; + unsigned long unaligned_addr = addr; + struct page_info *page; + l1_pgentry_t pte, ol1e, nl1e, *pl1e; + struct vcpu *v = current; + struct domain *d = v->domain; + int ret; + + /* Only allow naturally-aligned stores within the original %cr2 page. */ + if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) ) + { + MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", + ptwr_ctxt->cr2, addr, bytes); + return X86EMUL_UNHANDLEABLE; + } + + /* Turn a sub-word access into a full-word access. */ + if ( bytes != sizeof(paddr_t) ) + { + paddr_t full; + unsigned int rc, offset = addr & (sizeof(paddr_t)-1); + + /* Align address; read full word. */ + addr &= ~(sizeof(paddr_t)-1); + if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 ) + { + x86_emul_pagefault(0, /* Read fault. */ + addr + sizeof(paddr_t) - rc, + &ptwr_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + /* Mask out bits provided by caller. */ + full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8)); + /* Shift the caller value and OR in the missing bits. */ + val &= (((paddr_t)1 << (bytes*8)) - 1); + val <<= (offset)*8; + val |= full; + /* Also fill in missing parts of the cmpxchg old value. */ + old &= (((paddr_t)1 << (bytes*8)) - 1); + old <<= (offset)*8; + old |= full; + } + + pte = ptwr_ctxt->pte; + mfn = l1e_get_pfn(pte); + page = mfn_to_page(mfn); + + /* We are looking only for read-only mappings of p.t. pages. */ + ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT); + ASSERT(mfn_valid(_mfn(mfn))); + ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table); + ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0); + ASSERT(page_get_owner(page) == d); + + /* Check the new PTE. */ + nl1e = l1e_from_intpte(val); + switch ( ret = get_page_from_l1e(nl1e, d, d) ) + { + default: + if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) && + !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) + { + /* + * If this is an upper-half write to a PAE PTE then we assume that + * the guest has simply got the two writes the wrong way round. We + * zap the PRESENT bit on the assumption that the bottom half will + * be written immediately after we return to the guest. + */ + gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %" + PRIpte"\n", l1e_get_intpte(nl1e)); + l1e_remove_flags(nl1e, _PAGE_PRESENT); + } + else + { + MEM_LOG("ptwr_emulate: could not get_page_from_l1e()"); + return X86EMUL_UNHANDLEABLE; + } + break; + case 0: + break; + case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: + ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); + l1e_flip_flags(nl1e, ret); + break; + } + + adjust_guest_l1e(nl1e, d); + + /* Checked successfully: do the update (write or cmpxchg). */ + pl1e = map_domain_page(_mfn(mfn)); + pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK)); + if ( do_cmpxchg ) + { + int okay; + intpte_t t = old; + ol1e = l1e_from_intpte(old); + + okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), + &t, l1e_get_intpte(nl1e), _mfn(mfn)); + okay = (okay && t == old); + + if ( !okay ) + { + unmap_domain_page(pl1e); + put_page_from_l1e(nl1e, d); + return X86EMUL_RETRY; + } + } + else + { + ol1e = *pl1e; + if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) ) + BUG(); + } + + trace_ptwr_emulation(addr, nl1e); + + unmap_domain_page(pl1e); + + /* Finally, drop the old PTE. */ + put_page_from_l1e(ol1e, d); + + return X86EMUL_OKAY; +} + +static int ptwr_emulated_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t val = 0; + + if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes ) + { + MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)", + offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + memcpy(&val, p_data, bytes); + + return ptwr_emulated_update( + offset, 0, val, bytes, 0, + container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); +} + +static int ptwr_emulated_cmpxchg( + enum x86_segment seg, + unsigned long offset, + void *p_old, + void *p_new, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t old = 0, new = 0; + + if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) ) + { + MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)", + offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + return ptwr_emulated_update( + offset, old, new, bytes, 1, + container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); +} + +static int pv_emul_is_mem_write(const struct x86_emulate_state *state, + struct x86_emulate_ctxt *ctxt) +{ + return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY + : X86EMUL_UNHANDLEABLE; +} + +static const struct x86_emulate_ops ptwr_emulate_ops = { + .read = ptwr_emulated_read, + .insn_fetch = ptwr_emulated_read, + .write = ptwr_emulated_write, + .cmpxchg = ptwr_emulated_cmpxchg, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +/* Write page fault handler: check if guest is trying to modify a PTE. */ +int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, + struct cpu_user_regs *regs) +{ + struct domain *d = v->domain; + struct page_info *page; + l1_pgentry_t pte; + struct ptwr_emulate_ctxt ptwr_ctxt = { + .ctxt = { + .regs = regs, + .vendor = d->arch.cpuid->x86_vendor, + .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, + .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, + .swint_emulate = x86_swint_emulate_none, + }, + }; + int rc; + + /* Attempt to read the PTE that maps the VA being accessed. */ + guest_get_eff_l1e(addr, &pte); + + /* We are looking only for read-only mappings of p.t. pages. */ + if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) || + rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) || + !get_page_from_pagenr(l1e_get_pfn(pte), d) ) + goto bail; + + page = l1e_get_page(pte); + if ( !page_lock(page) ) + { + put_page(page); + goto bail; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + put_page(page); + goto bail; + } + + ptwr_ctxt.cr2 = addr; + ptwr_ctxt.pte = pte; + + rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops); + + page_unlock(page); + put_page(page); + + switch ( rc ) + { + case X86EMUL_EXCEPTION: + /* + * This emulation only covers writes to pagetables which are marked + * read-only by Xen. We tolerate #PF (in case a concurrent pagetable + * update has succeeded on a different vcpu). Anything else is an + * emulation bug, or a guest playing with the instruction stream under + * Xen's feet. + */ + if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && + ptwr_ctxt.ctxt.event.vector == TRAP_page_fault ) + pv_inject_event(&ptwr_ctxt.ctxt.event); + else + gdprintk(XENLOG_WARNING, + "Unexpected event (type %u, vector %#x) from emulation\n", + ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector); + + /* Fallthrough */ + case X86EMUL_OKAY: + + if ( ptwr_ctxt.ctxt.retire.singlestep ) + pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); + + /* Fallthrough */ + case X86EMUL_RETRY: + perfc_incr(ptwr_emulations); + return EXCRET_fault_fixed; + } + + bail: + return 0; +} + + +/************************* + * fault handling for read-only MMIO pages + */ + +int mmio_ro_emulated_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data; + + /* Only allow naturally-aligned stores at the original %cr2 address. */ + if ( ((bytes | offset) & (bytes - 1)) || !bytes || + offset != mmio_ro_ctxt->cr2 ) + { + MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)", + mmio_ro_ctxt->cr2, offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + return X86EMUL_OKAY; +} + +static const struct x86_emulate_ops mmio_ro_emulate_ops = { + .read = x86emul_unhandleable_rw, + .insn_fetch = ptwr_emulated_read, + .write = mmio_ro_emulated_write, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +int mmcfg_intercept_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data; + + /* + * Only allow naturally-aligned stores no wider than 4 bytes to the + * original %cr2 address. + */ + if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes || + offset != mmio_ctxt->cr2 ) + { + MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)", + mmio_ctxt->cr2, offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + offset &= 0xfff; + if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf, + offset, bytes, p_data) >= 0 ) + pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf), + PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes, + *(uint32_t *)p_data); + + return X86EMUL_OKAY; +} + +static const struct x86_emulate_ops mmcfg_intercept_ops = { + .read = x86emul_unhandleable_rw, + .insn_fetch = ptwr_emulated_read, + .write = mmcfg_intercept_write, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +/* Check if guest is trying to modify a r/o MMIO page. */ +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, + struct cpu_user_regs *regs) +{ + l1_pgentry_t pte; + unsigned long mfn; + unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG; + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr }; + struct x86_emulate_ctxt ctxt = { + .regs = regs, + .vendor = v->domain->arch.cpuid->x86_vendor, + .addr_size = addr_size, + .sp_size = addr_size, + .swint_emulate = x86_swint_emulate_none, + .data = &mmio_ro_ctxt + }; + int rc; + + /* Attempt to read the PTE that maps the VA being accessed. */ + guest_get_eff_l1e(addr, &pte); + + /* We are looking only for read-only mappings of MMIO pages. */ + if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) + return 0; + + mfn = l1e_get_pfn(pte); + if ( mfn_valid(_mfn(mfn)) ) + { + struct page_info *page = mfn_to_page(mfn); + struct domain *owner = page_get_owner_and_reference(page); + + if ( owner ) + put_page(page); + if ( owner != dom_io ) + return 0; + } + + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + return 0; + + if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) ) + rc = x86_emulate(&ctxt, &mmcfg_intercept_ops); + else + rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops); + + switch ( rc ) + { + case X86EMUL_EXCEPTION: + /* + * This emulation only covers writes to MMCFG space or read-only MFNs. + * We tolerate #PF (from hitting an adjacent page or a successful + * concurrent pagetable update). Anything else is an emulation bug, + * or a guest playing with the instruction stream under Xen's feet. + */ + if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && + ctxt.event.vector == TRAP_page_fault ) + pv_inject_event(&ctxt.event); + else + gdprintk(XENLOG_WARNING, + "Unexpected event (type %u, vector %#x) from emulation\n", + ctxt.event.type, ctxt.event.vector); + + /* Fallthrough */ + case X86EMUL_OKAY: + + if ( ctxt.retire.singlestep ) + pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); + + /* Fallthrough */ + case X86EMUL_RETRY: + perfc_incr(ptwr_emulations); + return EXCRET_fault_fixed; + } + + return 0; +} + +/************************* + * PV MMU hypercalls + */ +long do_mmu_update( + XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs, + unsigned int count, + XEN_GUEST_HANDLE_PARAM(uint) pdone, + unsigned int foreigndom) +{ + struct mmu_update req; + void *va; + unsigned long gpfn, gmfn, mfn; + struct page_info *page; + unsigned int cmd, i = 0, done = 0, pt_dom; + struct vcpu *curr = current, *v = curr; + struct domain *d = v->domain, *pt_owner = d, *pg_owner; + struct domain_mmap_cache mapcache; + uint32_t xsm_needed = 0; + uint32_t xsm_checked = 0; + int rc = put_old_guest_table(curr); + + if ( unlikely(rc) ) + { + if ( likely(rc == -ERESTART) ) + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone, + foreigndom); + return rc; + } + + if ( unlikely(count == MMU_UPDATE_PREEMPTED) && + likely(guest_handle_is_null(ureqs)) ) + { + /* See the curr->arch.old_guest_table related + * hypercall_create_continuation() below. */ + return (int)foreigndom; + } + + if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) + { + count &= ~MMU_UPDATE_PREEMPTED; + if ( unlikely(!guest_handle_is_null(pdone)) ) + (void)copy_from_guest(&done, pdone, 1); + } + else + perfc_incr(calls_to_mmu_update); + + if ( unlikely(!guest_handle_okay(ureqs, count)) ) + return -EFAULT; + + if ( (pt_dom = foreigndom >> 16) != 0 ) + { + /* Pagetables belong to a foreign domain (PFD). */ + if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL ) + return -ESRCH; + + if ( pt_owner == d ) + rcu_unlock_domain(pt_owner); + else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL ) + { + rc = -EINVAL; + goto out; + } + } + + if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL ) + { + rc = -ESRCH; + goto out; + } + + domain_mmap_cache_init(&mapcache); + + for ( i = 0; i < count; i++ ) + { + if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) + { + rc = -ERESTART; + break; + } + + if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) ) + { + MEM_LOG("Bad __copy_from_guest"); + rc = -EFAULT; + break; + } + + cmd = req.ptr & (sizeof(l1_pgentry_t)-1); + + switch ( cmd ) + { + /* + * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table. + * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR) + * current A/D bits. + */ + case MMU_NORMAL_PT_UPDATE: + case MMU_PT_UPDATE_PRESERVE_AD: + { + p2m_type_t p2mt; + + rc = -EOPNOTSUPP; + if ( unlikely(paging_mode_refcounts(pt_owner)) ) + break; + + xsm_needed |= XSM_MMU_NORMAL_UPDATE; + if ( get_pte_flags(req.val) & _PAGE_PRESENT ) + { + xsm_needed |= XSM_MMU_UPDATE_READ; + if ( get_pte_flags(req.val) & _PAGE_RW ) + xsm_needed |= XSM_MMU_UPDATE_WRITE; + } + if ( xsm_needed != xsm_checked ) + { + rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed); + if ( rc ) + break; + xsm_checked = xsm_needed; + } + rc = -EINVAL; + + req.ptr -= cmd; + gmfn = req.ptr >> PAGE_SHIFT; + page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC); + + if ( p2m_is_paged(p2mt) ) + { + ASSERT(!page); + p2m_mem_paging_populate(pg_owner, gmfn); + rc = -ENOENT; + break; + } + + if ( unlikely(!page) ) + { + MEM_LOG("Could not get page for normal update"); + break; + } + + mfn = page_to_mfn(page); + va = map_domain_page_with_cache(mfn, &mapcache); + va = (void *)((unsigned long)va + + (unsigned long)(req.ptr & ~PAGE_MASK)); + + if ( page_lock(page) ) + { + switch ( page->u.inuse.type_info & PGT_type_mask ) + { + case PGT_l1_page_table: + { + l1_pgentry_t l1e = l1e_from_intpte(req.val); + p2m_type_t l1e_p2mt = p2m_ram_rw; + struct page_info *target = NULL; + p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ? + P2M_UNSHARE : P2M_ALLOC; + + if ( paging_mode_translate(pg_owner) ) + target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e), + &l1e_p2mt, q); + + if ( p2m_is_paged(l1e_p2mt) ) + { + if ( target ) + put_page(target); + p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); + rc = -ENOENT; + break; + } + else if ( p2m_ram_paging_in == l1e_p2mt && !target ) + { + rc = -ENOENT; + break; + } + /* If we tried to unshare and failed */ + else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) ) + { + /* We could not have obtained a page ref. */ + ASSERT(target == NULL); + /* And mem_sharing_notify has already been called. */ + rc = -ENOMEM; + break; + } + + rc = mod_l1_entry(va, l1e, mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v, + pg_owner); + if ( target ) + put_page(target); + } + break; + case PGT_l2_page_table: + rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_l3_page_table: + rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_l4_page_table: + rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_writable_page: + perfc_incr(writable_mmu_updates); + if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) + rc = 0; + break; + } + page_unlock(page); + if ( rc == -EINTR ) + rc = -ERESTART; + } + else if ( get_page_type(page, PGT_writable_page) ) + { + perfc_incr(writable_mmu_updates); + if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) + rc = 0; + put_page_type(page); + } + + unmap_domain_page_with_cache(va, &mapcache); + put_page(page); + } + break; + + case MMU_MACHPHYS_UPDATE: + if ( unlikely(d != pt_owner) ) + { + rc = -EPERM; + break; + } + + if ( unlikely(paging_mode_translate(pg_owner)) ) + { + rc = -EINVAL; + break; + } + + mfn = req.ptr >> PAGE_SHIFT; + gpfn = req.val; + + xsm_needed |= XSM_MMU_MACHPHYS_UPDATE; + if ( xsm_needed != xsm_checked ) + { + rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed); + if ( rc ) + break; + xsm_checked = xsm_needed; + } + + if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) ) + { + MEM_LOG("Could not get page for mach->phys update"); + rc = -EINVAL; + break; + } + + set_gpfn_from_mfn(mfn, gpfn); + + paging_mark_dirty(pg_owner, _mfn(mfn)); + + put_page(mfn_to_page(mfn)); + break; + + default: + MEM_LOG("Invalid page update command %x", cmd); + rc = -ENOSYS; + break; + } + + if ( unlikely(rc) ) + break; + + guest_handle_add_offset(ureqs, 1); + } + + if ( rc == -ERESTART ) + { + ASSERT(i < count); + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", + ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); + } + else if ( curr->arch.old_guest_table ) + { + XEN_GUEST_HANDLE_PARAM(void) null; + + ASSERT(rc || i == count); + set_xen_guest_handle(null, NULL); + /* + * In order to have a way to communicate the final return value to + * our continuation, we pass this in place of "foreigndom", building + * on the fact that this argument isn't needed anymore. + */ + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", null, + MMU_UPDATE_PREEMPTED, null, rc); + } + + mm_put_pg_owner(pg_owner); + + domain_mmap_cache_destroy(&mapcache); + + perfc_add(num_page_updates, i); + + out: + if ( pt_owner != d ) + rcu_unlock_domain(pt_owner); + + /* Add incremental work we have done to the @done output parameter. */ + if ( unlikely(!guest_handle_is_null(pdone)) ) + { + done += i; + copy_to_guest(pdone, &done, 1); + } + + return rc; +} + +static int __do_update_va_mapping( + unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) +{ + l1_pgentry_t val = l1e_from_intpte(val64); + struct vcpu *v = current; + struct domain *d = v->domain; + struct page_info *gl1pg; + l1_pgentry_t *pl1e; + unsigned long bmap_ptr, gl1mfn; + cpumask_t *mask = NULL; + int rc; + + perfc_incr(calls_to_update_va); + + rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val); + if ( rc ) + return rc; + + rc = -EINVAL; + pl1e = guest_map_l1e(va, &gl1mfn); + if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) ) + goto out; + + gl1pg = mfn_to_page(gl1mfn); + if ( !page_lock(gl1pg) ) + { + put_page(gl1pg); + goto out; + } + + if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(gl1pg); + put_page(gl1pg); + goto out; + } + + rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner); + + page_unlock(gl1pg); + put_page(gl1pg); + + out: + if ( pl1e ) + guest_unmap_l1e(pl1e); + + switch ( flags & UVMF_FLUSHTYPE_MASK ) + { + case UVMF_TLB_FLUSH: + switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) + { + case UVMF_LOCAL: + flush_tlb_local(); + break; + case UVMF_ALL: + mask = d->domain_dirty_cpumask; + break; + default: + mask = this_cpu(scratch_cpumask); + rc = mm_vcpumask_to_pcpumask(d, + const_guest_handle_from_ptr(bmap_ptr, + void), + mask); + break; + } + if ( mask ) + flush_tlb_mask(mask); + break; + + case UVMF_INVLPG: + switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) + { + case UVMF_LOCAL: + paging_invlpg(v, va); + break; + case UVMF_ALL: + mask = d->domain_dirty_cpumask; + break; + default: + mask = this_cpu(scratch_cpumask); + rc = mm_vcpumask_to_pcpumask(d, + const_guest_handle_from_ptr(bmap_ptr, + void), + mask); + break; + } + if ( mask ) + flush_tlb_one_mask(mask, va); + break; + } + + return rc; +} + +long do_update_va_mapping(unsigned long va, u64 val64, + unsigned long flags) +{ + return __do_update_va_mapping(va, val64, flags, current->domain); +} + +long do_update_va_mapping_otherdomain(unsigned long va, u64 val64, + unsigned long flags, + domid_t domid) +{ + struct domain *pg_owner; + int rc; + + if ( (pg_owner = mm_get_pg_owner(domid)) == NULL ) + return -ESRCH; + + rc = __do_update_va_mapping(va, val64, flags, pg_owner); + + mm_put_pg_owner(pg_owner); + + return rc; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 967b7fcda9..170908f7f2 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -649,6 +649,11 @@ int mm_vcpumask_to_pcpumask(struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask); +int create_grant_pv_mapping(uint64_t addr, unsigned long frame, + unsigned int flags, unsigned int cache_flags); +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, + uint64_t new_addr, unsigned int flags); + #define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a) #define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT) -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |