[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Date: Mon, 3 Apr 2023 12:14:49 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BkDRg+VPEDuLPHDY4NyiLNhDqmUdPszHPJ+4Hgl9ygo=; b=DhcRLAqhhOZQNew+oeJUXeRyqtacTvgq3kWDfnT/9sQ0NrW8wA2OZSmGWr2dR3+ElffqNc454lBP33JjEHMuPtynjbhB1CNpmCOh1L1VMqrd+VRgt+CdlL3bPtnxH82lxuTjCiPsMnbAdO0OHp3Ygp5qJpYfvPBbt9XZQpxwTlfuSEsIq8hN5xD5VQaz/o1wQbKi49IzM0G/IbKHaFPpCL3b7LC1hwg4cBp26nwDRnxKoG5EQX1c325sTg5/gEygaBOCZaN7cP9K5qXBKxcG8I8RuoFF+awZdcvba3VmH7ncx+ktYnGbYJVmW/5N8KbO4w5temK7DUwqxW7e8ENaJQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JLNLFtP/mLsiN0Spq6kEemtwDqhnWaMyyjNvSsWBS4Amj7Ya+uUaODmSW/4XJTlnbUHlUXlXLmIUsNilA9Ta/TVtcONn6r55cFfG71fsz0ldzUvdKM5RPsQqKlBoNuEJ9v0TLLBvWNXwmotrh2wfoCOyzMvMdeHKMiRhQkTRKHch/3BTMVzUDahK4EFjNbihBNpVlMZAktk/VxOvlaZtFsOWMePFgGBbi65/SXnBXaxtxRw2LCKa0u7PJNA1eb4O7EpZnMiraWxdQljdApjCZ/oD9sXLAJFcc/VorFDNedcKulredIY53/XfwQiTDHHrPHLxioOPEZZ5afp9Swl2oA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 03 Apr 2023 10:15:32 +0000
  • Ironport-data: A9a23:e7v/ZaJpa6pe36ieFE+R95QlxSXFcZb7ZxGr2PjKsXjdYENS0DVSy TEaWTzVPviJYzT9fYh1a4Wz9U1V6MDSmNNmTgtlqX01Q3x08seUXt7xwmUcnc+xBpaaEB84t ZV2hv3odp1coqr0/0/1WlTZhSAgk/rOHvykU7Ss1hlZHWdMUD0mhQ9oh9k3i4tphcnRKw6Ws Jb5rta31GWNglaYCUpJrfPTwP9TlK6q4mhA4gRlPakjUGL2zBH5MrpOfcldEFOgKmVkNrbSb /rOyri/4lTY838FYj9yuu+mGqGiaue60Tmm0hK6aYD76vRxjnVaPpIAHOgdcS9qZwChxLid/ jnvWauYEm/FNoWU8AgUvoIx/ytWZcWq85efSZSzXFD6I+QrvBIAzt03ZHzaM7H09c4sXktX/ 6FfCwwofzndo8fp3ZiDFeZV05FLwMnDZOvzu1lG5BSAVbMMZ8+GRK/Ho9hFwD03m8ZCW+7EY NYUYiZuaxKGZABTPlAQC9Q1m+LAanvXKmUE7g7K4/VvpTGLk2Sd05C0WDbRUsaNSshP2F6Ru 0rN/njjAwFcP9uaodaA2iv02raVwnymBer+EpWkp9pvpw2W61UrM0UIdwKbudWy2nyHDoc3x 0s8v3BGQbIJ3FymSJzxUgO1pFaAvwUAQJxAHusi8gaPx6HIpQGDCQAsTDRMddgnv88eXiEx2 xmCmNaBLSNrmK2YTzSa7Lj8kN+pES0cLGtHbylbSwIAuoHnuNtq1k2JSct/GqmoiNGzASv33 z2BsCk5gfMUkNIP0KK4u1vAhlpAu6T0c+L83S2PNkrN0++zTNTNi1CAgbQD0ct9EQ==
  • Ironport-hdrordr: A9a23:lJsV9qDt6wCk8mHlHelo55DYdb4zR+YMi2TDt3oddfU1SL38qy nKpp4mPHDP5wr5NEtPpTniAtjjfZq/z/5ICOAqVN/PYOCPggCVxepZnOjfKlPbehEX9oRmpN 1dm6oVMqyMMbCt5/yKnDVRELwbsaa6GLjDv5a785/0JzsaE52J6W1Ce2GmO3wzfiZqL7wjGq GR48JWzgDQAkj+PqyAdx84t/GonayzqK7b
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Global p2m type recalculations (as triggered by logdirty) can create
so much contention on the p2m lock that simple guest operations like
VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32)
will cease to work in a timely manner, up to the point that Linux
kernel versions that sill use the VCPU_SSHOTTMR_future flag with the
singleshot timer will cease to work:

[   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
[   82.793075] CE: Reprogramming failure. Giving up
[   82.779470] CE: Reprogramming failure. Giving up
[   82.821864] CE: xen increased min_delta_ns to 506250 nsec
[   82.821864] CE: xen increased min_delta_ns to 759375 nsec
[   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
[   82.821864] CE: Reprogramming failure. Giving up
[   82.856256] CE: Reprogramming failure. Giving up
[   84.566279] CE: Reprogramming failure. Giving up
[   84.649493] Freezing user space processes ...
[  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected 
by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
[  130.604032] Task dump for CPU 14:
[  130.604032] swapper/14      R  running task        0     0      1 0x00000000
[  130.604032] Call Trace:
[  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
[  549.655463] Task dump for CPU 26:
[  549.655463] swapper/26      R  running task        0     0      1 0x00000000
[  549.655463] Call Trace:
[  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
[  821.888596] Task dump for CPU 26:
[  821.888622] swapper/26      R  running task        0     0      1 0x00000000
[  821.888677] Call Trace:
[  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14

This is obviously undesirable.  One way to bodge the issue would be to
ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the
hypercall ABI.

Instead lower the contention in the lock by doing the recalculation
with the lock in read mode.  This is safe because only the flags/type
are changed, there's no PTE mfn change in the AMD recalculation logic.
The Intel (EPT) case is likely more complicated, as superpage
splitting for diverging EMT values must be done with the p2m lock in
taken in write mode.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
I'm unsure whether such modification is fully safe:  I think changing
the flags/type should be fine: the PTE write is performed using
safwrite_p2m_entry() which must be atomic (as the guest is still
running and accessing the page tables).  I'm slightly worried about
all PTE readers not using atomic accesses to do so (ie: pointer
returned by p2m_find_entry() should be read atomicallly), and code
assuming that a gfn type cannot change while holding the p2m lock in
read mode.

Wanted to post early in case someone knows any showstoppers with this
approach that make it a no-go, before I try to further evaluate
users.
---
 xen/arch/x86/mm/p2m-pt.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index cd1af33b67..f145647f01 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -486,9 +486,6 @@ static int cf_check do_recalc(struct p2m_domain *p2m, 
unsigned long gfn)
         p2m_type_t ot, nt;
         unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
 
-        if ( !valid_recalc(l1, e) )
-            P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
-                      p2m->domain->domain_id, gfn, level);
         ot = p2m_flags_to_type(l1e_get_flags(e));
         nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
         if ( nt != ot )
@@ -538,9 +535,9 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa)
      */
     ASSERT(!altp2m_active(current->domain));
 
-    p2m_lock(p2m);
+    p2m_read_lock(p2m);
     rc = do_recalc(p2m, PFN_DOWN(gpa));
-    p2m_unlock(p2m);
+    p2m_read_unlock(p2m);
 
     return rc;
 }
-- 
2.40.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.