[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v2] x86/PoD: defer nested P2M flushes


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 19 Oct 2021 14:52:27 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=J6uEHd9m49ZImhQ66YKOzPW7dyQfpUbMytSt7KlyfCo=; b=gSOQyhrdUMgyPMMmEnOyy7M+JwrdKRXYpzUi56lmWYiwRSwljFa3K7TMWmNJu6aSJWJJKAGszDptiUbPH/NF3o144CQxrvXW3V/DwhHkt2M5lhKfFSM/ww1kehysB+LioAjpxovgG8k9w+vk4zTko1WX5GupwfRevlwRK7dz7pHvrrJTy6S8Y+6Mjkqy8QA03yDetd+K6rn0sZuwLzCSQt7iN8MF1tjgxI63wKySLEPYbs8KIKCiJ1KYiBHaotqe8k5NNmIqXsyRnXmXupMYDD60bQFaqGpUk/uwpzIUyZ5o0TiPTQHTu+qmrd4+PLcllgXKz1ayfU4/xqWPZfsegw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mxl6j1Yatiqn8hiWMX/ORRIb9mgJKourLEWRGijuh+YEjSh9056MhvB2Xlea/j+ESucvaU42UvQAdPn4gVIMRi4lmD3D6jFxXPs6mPyRvUFv6ObfK+IDRRAD6gjDqhU5pEUoCaBm6Zsj65X7bRhbtYMqrV6GCMDMzhWWwgg6VKcmXywQzvCRFwaxp//1scoNcN3BjV4nnHrYj3BP6LZD5uBk2HW2NPcG2jsrkM6niHt6ICF+OXRYxLWN6Ou53w7jWaRj7c6XWsH50m8X3dd7kM1ENm9Q730AXRw5Q/pI5HsLScHWOJ4eSUyIZiOB74DYEkiIR4SteoR5vKQRMdiXGQ==
  • Authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Tue, 19 Oct 2021 12:52:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
order violation when the PoD lock is held around it. Hence such flushing
needs to be deferred. Steal the approach from p2m_change_type_range().
(Note that strictly speaking the change at the out_of_memory label is
not needed, as the domain gets crashed there anyway. The change is being
made nevertheless to avoid setting up a trap from someone meaning to
deal with that case better than by domain_crash().)

Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
p2m_flush_nestedp2m() invocation conditional. Note that this then also
alters behavior of p2m_change_type_range() on EPT, deferring the nested
flushes there as well. I think this should have been that way from the
introduction of the flag.

Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx>
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
v2: Also adjust ept_sync_domain_prepare(). Also convert the flush at the
    out_of_memory label. Extend description to cover these.

--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
     {
         if ( p2m_is_nestedp2m(p2m) )
             ept = &p2m_get_hostp2m(d)->ept;
-        else
+        else if ( !p2m->defer_nested_flush )
             p2m_flush_nestedp2m(d);
     }
 
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,6 +24,7 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/trace.h>
+#include <asm/hvm/nestedhvm.h>
 #include <asm/page.h>
 #include <asm/paging.h>
 #include <asm/p2m.h>
@@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct
 static int
 p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn);
 
+static void pod_unlock_and_flush(struct p2m_domain *p2m)
+{
+    pod_unlock(p2m);
+    p2m->defer_nested_flush = false;
+    if ( nestedhvm_enabled(p2m->domain) )
+        p2m_flush_nestedp2m(p2m->domain);
+}
 
 /*
  * This function is needed for two reasons:
@@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma
 
     gfn_lock(p2m, gfn, order);
     pod_lock(p2m);
+    p2m->defer_nested_flush = true;
 
     /*
      * If we don't have any outstanding PoD entries, let things take their
@@ -665,7 +674,7 @@ out_entry_check:
     }
 
 out_unlock:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     gfn_unlock(p2m, gfn, order);
     return ret;
 }
@@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai
      * won't start until we're done.
      */
     if ( unlikely(d->is_dying) )
-        goto out_fail;
-
+    {
+        pod_unlock(p2m);
+        return false;
+    }
 
     /*
      * Because PoD does not have cache list for 1GB pages, it has to remap
@@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai
                               p2m_populate_on_demand, p2m->default_access);
     }
 
+    p2m->defer_nested_flush = true;
+
     /* Only reclaim if we're in actual need of more cache. */
     if ( p2m->pod.entry_count > p2m->pod.count )
         pod_eager_reclaim(p2m);
@@ -1229,22 +1242,25 @@ p2m_pod_demand_populate(struct p2m_domai
         __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
     }
 
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return true;
+
 out_of_memory:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
 
     printk("%s: Dom%d out of PoD memory! (tot=%"PRIu32" ents=%ld dom%d)\n",
            __func__, d->domain_id, domain_tot_pages(d),
            p2m->pod.entry_count, current->domain->domain_id);
     domain_crash(d);
     return false;
+
 out_fail:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return false;
+
 remap_and_retry:
     BUG_ON(order != PAGE_ORDER_2M);
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
 
     /*
      * Remap this 2-meg region in singleton chunks. See the comment on the




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.