[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: AMD support for paging



Am 15.02.2012 18:02, schrieb Tim Deegan:
At 08:06 -0800 on 15 Feb (1329293205), Andres Lagar-Cavilla wrote:
Where will p2m access bit be stored? Please note that bit 52 - bit 58
for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63
and bit 1 - bit 8 are free to use if PR bit = 1.

Wei,

Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the
current convention?

I propose limiting p2m type to bits 52-55, and storing p2m access on bits
56-58. So, p2m_ram_rw pages with a non-zero access would break the "must
be zero" requirement/convention.

Eh, let's step back a bit from this.  The problem is that the IOMMU
can't handle these bits being set, but the IOMMU can't handle paged-out
or CoW memory either (because there's no way to cause the peripheral to
retry a DMA that faulted).

So can we just say that any VM that's going to use paging, sharing
or access can't have a unified p2m and IOMMU table?  That's a bit ugly
since the admin will have to make the choice, but it seems like the
hardware is forcing us into it.

Seems true. Although iommuv2 support demand paging, it only works for guest OS process context. And we also need support from device side to do that.


Maybe the sensible choice is to default to _not_ sharing p2m and IOMMU
tables, and have thet be something an expert user can turn on (and which
disables the other features)?

That looks good to me.

Should we have a hypervisor-level interlock between PCI passthrough and
paging/sharing/access anyway?  Doing both is unlikely to lead to happiness.

I think we should. DMA might result miserable errors when paging happens.

Thanks
Wei

Tim.

Right now, without any considerations for paging, we're storing the p2m
type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro,
p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the
course. Given your statement above, and table 14 in the IOMMU manual, how
is this even working? Or is it not working?

Would a violation of these rules cause a VMEXIT_SHUTDOWN?

Thanks a lot for the info!
Andres

Thanks,
Wei

An alternative to enabling mem_access on AMD processors will be to limit
the access types to 3 bits. This would force disabling support for two
modes. I'm inclined to disable two out of X, W and WX. I don't think
they
make sense without R permissions.
Thanks!
Andres


Thanks,
Wei


Finally, the triple fault. Maybe I'm missing something obvious.
Comments
welcome.

Patch inline below, thanks!
Andres

Enable AMD support for paging.

Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx>
Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx>

diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x
                if ( !hap_enabled(d) )
                    break;

-            /* Currently only EPT is supported */
-            if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-                break;
-
                rc = -EXDEV;
                /* Disallow paging in a PoD guest */
                if ( p2m->pod.entry_count )
diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -53,6 +53,20 @@
    #define P2M_BASE_FLAGS \
            (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED)

+#ifdef __x86_64__
+/* l1e_from_pfn is not designed to have INVALID_MFN stored. The
0xff..ff
+ * value tramples over the higher-order bits used for flags (NX,
p2mt,
+ * etc.) This happens for paging entries. Thus we do this clip/unclip
+ * juggle for l1 entries only (no paging superpages!) */
+#define EFF_MFN_WIDTH       (PADDR_BITS-PAGE_SHIFT) /* 40 bits */
+#define clipped_mfn(mfn)    ((mfn)&    ((1UL<<    EFF_MFN_WIDTH) - 1))
+#define unclip_mfn(mfn)     ((clipped_mfn((mfn)) == INVALID_MFN) ? \
+                                INVALID_MFN : (mfn))
+#else
+#define clipped_mfn(mfn)    (mfn)
+#define unclip_mfn(mfn)     (mfn)
+#endif /* __x86_64__ */
+
    static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
    {
        unsigned long flags;
@@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p
        case p2m_invalid:
        case p2m_mmio_dm:
        case p2m_populate_on_demand:
+    case p2m_ram_paging_out:
+    case p2m_ram_paged:
+    case p2m_ram_paging_in:
        default:
            return flags;
        case p2m_ram_ro:
@@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m
                                          shift, max)) )
            return 0;

-    /* PoD: Not present doesn't imply empty. */
+    /* PoD/paging: Not present doesn't imply empty. */
        if ( !l1e_get_flags(*p2m_entry) )
        {
            struct page_info *pg;
@@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un
                                       0, L1_PAGETABLE_ENTRIES);
            ASSERT(p2m_entry);

-        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
-            entry_content = l1e_from_pfn(mfn_x(mfn),
+        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ||
+             (p2mt == p2m_ram_paged) || (p2mt == p2m_ram_paging_in) )
+            entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)),
                                             p2m_type_to_flags(p2mt,
mfn));
            else
                entry_content = l1e_empty();
@@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un
            if ( entry_content.l1 != 0 )
            {
                p2m_add_iommu_flags(&entry_content, 0,
iommu_pte_flags);
-            old_mfn = l1e_get_pfn(*p2m_entry);
+            old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry));
            }
            /* level 1 entry */
            p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn,
entry_content, 1);
@@ -615,11 +633,12 @@ pod_retry_l1:
                               sizeof(l1e));

        if ( ret == 0 ) {
+        unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e));
            p2mt = p2m_flags_to_type(l1e_get_flags(l1e));
-        ASSERT(l1e_get_pfn(l1e) != INVALID_MFN || !p2m_is_ram(p2mt));
+        ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) ||
+                (l1e_mfn == INVALID_MFN&&    p2m_is_paging(p2mt)) );

-        if ( p2m_flags_to_type(l1e_get_flags(l1e))
-             == p2m_populate_on_demand )
+        if ( p2mt == p2m_populate_on_demand )
            {
                /* The read has succeeded, so we know that the mapping
                 * exits at this point.  */
@@ -641,7 +660,7 @@ pod_retry_l1:
            }

            if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) )
-            mfn = _mfn(l1e_get_pfn(l1e));
+            mfn = _mfn(l1e_mfn);
            else
                /* XXX see above */
                p2mt = p2m_mmio_dm;
@@ -783,18 +802,26 @@ pod_retry_l2:
    pod_retry_l1:
        if ( (l1e_get_flags(*l1e)&    _PAGE_PRESENT) == 0 )
        {
+        p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e));
            /* PoD: Try to populate */
-        if ( p2m_flags_to_type(l1e_get_flags(*l1e)) ==
p2m_populate_on_demand )
+        if ( l1t == p2m_populate_on_demand )
            {
                if ( q != p2m_query ) {
                    if ( !p2m_pod_demand_populate(p2m, gfn,
PAGE_ORDER_4K,
q) )
                        goto pod_retry_l1;
                } else
                    *t = p2m_populate_on_demand;
+        } else {
+            if ( p2m_is_paging(l1t) )
+            {
+                *t = l1t;
+                /* No need to unclip due to check below */
+                mfn = _mfn(l1e_get_pfn(*l1e));
+            }
            }

            unmap_domain_page(l1e);
-        return _mfn(INVALID_MFN);
+        return (l1t == p2m_ram_paging_out) ? mfn : _mfn(INVALID_MFN);
        }
        mfn = _mfn(l1e_get_pfn(*l1e));
        *t = p2m_flags_to_type(l1e_get_flags(*l1e));
@@ -914,7 +941,7 @@ static void p2m_change_type_global(struc
                        flags = l1e_get_flags(l1e[i1]);
                        if ( p2m_flags_to_type(flags) != ot )
                            continue;
-                    mfn = l1e_get_pfn(l1e[i1]);
+                    mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
                        gfn = i1 + (i2 + (i3
    #if CONFIG_PAGING_LEVELS>= 4
                                        + (i4 * L3_PAGETABLE_ENTRIES)
@@ -923,7 +950,7 @@ static void p2m_change_type_global(struc
                               * L2_PAGETABLE_ENTRIES) *
L1_PAGETABLE_ENTRIES;
                        /* create a new 1le entry with the new type */
                        flags = p2m_type_to_flags(nt, _mfn(mfn));
-                    l1e_content = l1e_from_pfn(mfn, flags);
+                    l1e_content = l1e_from_pfn(clipped_mfn(mfn),
flags);
                        p2m->write_p2m_entry(p2m, gfn,&l1e[i1],
                                             l1mfn, l1e_content, 1);
                    }
@@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
                                    entry_count++;
                                continue;
                            }
-                        mfn = l1e_get_pfn(l1e[i1]);
+                        mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
                            ASSERT(mfn_valid(_mfn(mfn)));
                            m2pfn = get_gpfn_from_mfn(mfn);
                            if ( m2pfn != gfn&&


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
















_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.