[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU



On 12/04/2013 05:40 PM, Konrad Rzeszutek Wilk wrote:
On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxxx> wrote:
On 12/04/2013 12:04 PM, Ian Campbell wrote:
On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
Correct. The check for mapping domain 0's 1:1 map is overly broad I
think, and erroneously prevents a domU from mapping a foreign PFN < 1M.

But that's the source of my not understanding: xen_make_pte()
derives addr from the passed in pte, and that pte can - for a
foreign domain's page - hardly hold a PFN. Otherwise how would
the translation to MFN be supposed to happen? Yet, if it's a
machine address that's coming in, it can't point into the low 1Mb.

Isn't it a foreign gpfn at this point, which for an HVM guest is
actually a PFN not an MFN?

You are making me think I might be talking out my a**e though, because
what is a foreign mapping even doing in xen_make_pte -- those need to be
instantiated in a special way.

I believe the callpath for this is

xen_remap_domain_range() (mmu.c)
   |
   v
remap_area_pfn_pte() (mmu.c)
   |
   v
pfn_pte() (somewhere, one of the pgtable.h hdrs)
   |
   v
__pte() (paravirt.h)
   |
   v
xen_make_pte (mmu.c) via pv_mmu_ops.make_pte

Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN,
but it seems the case.

But that's a fundamental thing to explain. As Ian says - foreign PFNs
shouldn't make it here, or else how do you know how to translate
them to MFNs (as you can't consult the local P2M table to do so)?

This is all done via the toolstack which does the /dev/xen ioctl to map
some of its user-space memory in the guest memory. It ends up getting
the MFNs via some hypercall (forgotten which) and inputs those in the
IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
_PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
the P2M and use that specific MFN value.

It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
out - but this is the last bastion where it is used.

The check that the xen_make_pte for the VM_IO for 1:1 pages is not
really needed anymore - as we have the 1:1 pages in the P2M (except for
the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
there - but that is different bug).

So the check there could actually be lessen - and we can piggyback on
the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
xen_pte_val - which we would only be set by xen_make_pte iff P2M says
the page is 1:1.


Not compile tested:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..98efb65 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
                        if (mfn & IDENTITY_FRAME_BIT) {
                                mfn &= ~IDENTITY_FRAME_BIT;
                                flags |= _PAGE_IOMAP;
-                       }
+                       } else
+                               flags &= _PAGE_IOMAP;
                }
                val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
        }
@@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
                pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
        }
  #endif
-       if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
+       if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
                return pteval;

        return pte_mfn_to_pfn(pteval);
@@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
  #endif
        /*
         * Unprivileged domains are allowed to do IOMAPpings for
-        * PCI passthrough, but not map ISA space.  The ISA
-        * mappings are just dummy local mappings to keep other
-        * parts of the kernel happy.
+        * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
+        * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
+        * is supplied to use by xen_set_fixmap.
         */
-       if (unlikely(pte & _PAGE_IOMAP) &&
-           (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+       if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
                pte = iomap_pte(pte);

I think this wont work because _PAGE_SPECIAL is not set at this point yet (inside xen_make_pte). It is only set after xen_make_pte. This is why my patch contained this extra, rather nasty, hunk, which made _PAGE_SPECIAL set a bit earlier:

+static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t 
pgprot)
+{
+       return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
+                    massage_pgprot(pgprot) | _PAGE_SPECIAL);
+}
+
+
 static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
                                 unsigned long addr, void *data)
 {
        struct remap_data *rmd = data;
-       pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+       pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot);

        rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
        rmd->mmu_update->val = pte_val_ma(pte);


I've basically made a new function foreign_special_pfn_pte which is unrolled pte_mkspecial with a small difference that it sets _PAGE_SPECIAL bit before calling __pte, not after (because __pte calls into xen_make_pte). Maybe cleanest way of fixing this would be just to have separate path for this which doesn't use xen_make_pte at all?




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.