[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop



The comment about the "force-invalidate" loop gives two reasons for
its existence:

1. Breaking circular "linear pagetable" references

2. Cleaning up partially-validated pages.

The first reason been invalid since XSA-240: Since then it hasn't been
possible to generate circular linear pagetable references.

The second reason has been invalid since long before that: Before
calling relinquish_memory(), domain_relinquish_resources() calls
vcpu_destroy_pagetables() on each vcpu; this will in turn call
put_old_guest_table() on each vcpu.  The result should be that it's
not possible to have top-level partially-devalidated pagetables by the
time we get to relinquish_memory().

The loop, it turns out, was effectively there only to pick up
interrupted UNPIN operations of relinquish_memory() itself.  Now that
these are being handled by put_old_guest_table(), we can remove this
loop entirely.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx>
---
This patch has the advantage that it doesn't hide mis-accounted
partial pages anymore; but has the disadvantage that if there *is*
such a mis-accounting, it will become a resource leak (and thus an
XSA).  It might be a good idea to add another "clean up partial pages"
pass after all other pages have been cleaned up, with a suitable
ASSERT().  That way we have a higher chance of catching mis-accounting
during development, without risking opening up a memory / domain leak
in production.

CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Jan Beulich <jbeulich@xxxxxxxx>
---
 xen/arch/x86/domain.c | 71 -------------------------------------------
 1 file changed, 71 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b7968463cb..847a70302c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1950,7 +1950,6 @@ static int relinquish_memory(
     struct domain *d, struct page_list_head *list, unsigned long type)
 {
     struct page_info  *page;
-    unsigned long     x, y;
     int               ret = 0;
 
     ret = put_old_guest_table(current);
@@ -2004,76 +2003,6 @@ static int relinquish_memory(
 
         put_page_alloc_ref(page);
 
-        /*
-         * Forcibly invalidate top-most, still valid page tables at this point
-         * to break circular 'linear page table' references as well as clean up
-         * partially validated pages. This is okay because MMU structures are
-         * not shared across domains and this domain is now dead. Thus top-most
-         * valid tables are not in use so a non-zero count means circular
-         * reference or partially validated.
-         */
-        y = page->u.inuse.type_info;
-        for ( ; ; )
-        {
-            x = y;
-            if ( likely((x & PGT_type_mask) != type) ||
-                 likely(!(x & (PGT_validated|PGT_partial))) )
-                break;
-
-            y = cmpxchg(&page->u.inuse.type_info, x,
-                        x & ~(PGT_validated|PGT_partial));
-            if ( likely(y == x) )
-            {
-                /* No need for atomic update of type_info here: noone else 
updates it. */
-                switch ( ret = devalidate_page(page, x, 1) )
-                {
-                case 0:
-                    break;
-                case -EINTR:
-                    page_list_add(page, list);
-                    page->u.inuse.type_info |= PGT_validated;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    put_page(page);
-                    ret = -ERESTART;
-                    goto out;
-                case -ERESTART:
-                    page_list_add(page, list);
-                    /*
-                     * PGT_partial holds a type ref and a general ref.
-                     * If we came in with PGT_partial set, then we 1)
-                     * don't need to grab an extra type count, and 2)
-                     * do need to drop the extra page ref we grabbed
-                     * at the top of the loop.  If we didn't come in
-                     * with PGT_partial set, we 1) do need to drab an
-                     * extra type count, but 2) can transfer the page
-                     * ref we grabbed above to it.
-                     *
-                     * Note that we must increment type_info before
-                     * setting PGT_partial.  Theoretically it should
-                     * be safe to drop the page ref before setting
-                     * PGT_partial, but do it afterwards just to be
-                     * extra safe.
-                     */
-                    if ( !(x & PGT_partial) )
-                        page->u.inuse.type_info++;
-                    smp_wmb();
-                    page->u.inuse.type_info |= PGT_partial;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    goto out;
-                default:
-                    BUG();
-                }
-                if ( x & PGT_partial )
-                {
-                    page->u.inuse.type_info--;
-                    put_page(page);
-                }
-                break;
-            }
-        }
-
         /* Put the page on the list and /then/ potentially free it. */
         page_list_add_tail(page, &d->arch.relmem_list);
         put_page(page);
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.