[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 3 Apr 2023 17:32:39 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yW/TYZcHxb7whSHZnCCIC13DmOy6RkRRJZkmFNF2v9c=; b=m07aEJNuiy5pMk3GTfVwPhYPoiLta95wHv8rSvFoSgwewluzk1bNm2C+WCzi0LXWoNYa21BA5tLlaxvSn8DLC+k4yakUfCCRTl/IbLVEsrlvFRpL6iylIULND4qcbYc7m8PMKli5u1M2ORHTVHT6jf6ImgeK8+KsYBg65NtggybXxcGg9XM4cwiJc1lWkqvTShcoVUqku/u4u5MOMYh2H2oQPHYpQpb8+wCMP6TOlwu/C72s1B/amlro4SI5uq4OS1X1Ji43E6Lab6fjYw520lEg4u7wYqfk4E0GAchzLEx6cYR3JQavnW8G+8Ti8+hIeCo/yh4uVZR3KJ21fdQSiA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=m9npcgi1+bMVFezilYeE8bOu6Olj2VyP3KcCmsg5G+ZpVs5+CIR1JFLB82e6G0IWFqV/131USArPUZhODtJpu4UjJaMMZf2B5QUzz5iv8hEikgtejyK/zBfcBsXGNkKj67qCI/StS9mJ4P2esUQrKEpgnVmIQLkrN1tgH76o9siOtvVblmmNLlE58RR7+BQOyHy7P+UtRdjfCLqFamAN3Isn1nnpGzavlCqGrvnrEC6SHWhGNbngMRw/t42kuTrtcVS4FgiUUjQkvk6UEg1LR4m3F+qtAb9dyLzigBtATiPfAHaqUroUyl1IYmnTE8XxUJlfH9erzh35RsQFPOht+A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 03 Apr 2023 15:32:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.04.2023 12:14, Roger Pau Monne wrote:
> Global p2m type recalculations (as triggered by logdirty) can create
> so much contention on the p2m lock that simple guest operations like
> VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32)
> will cease to work in a timely manner, up to the point that Linux
> kernel versions that sill use the VCPU_SSHOTTMR_future flag with the
> singleshot timer will cease to work:
> 
> [   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
> [   82.793075] CE: Reprogramming failure. Giving up
> [   82.779470] CE: Reprogramming failure. Giving up
> [   82.821864] CE: xen increased min_delta_ns to 506250 nsec
> [   82.821864] CE: xen increased min_delta_ns to 759375 nsec
> [   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
> [   82.821864] CE: Reprogramming failure. Giving up
> [   82.856256] CE: Reprogramming failure. Giving up
> [   84.566279] CE: Reprogramming failure. Giving up
> [   84.649493] Freezing user space processes ...
> [  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected 
> by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
> [  130.604032] Task dump for CPU 14:
> [  130.604032] swapper/14      R  running task        0     0      1 
> 0x00000000
> [  130.604032] Call Trace:
> [  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
> by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
> [  549.655463] Task dump for CPU 26:
> [  549.655463] swapper/26      R  running task        0     0      1 
> 0x00000000
> [  549.655463] Call Trace:
> [  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> [  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
> by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
> [  821.888596] Task dump for CPU 26:
> [  821.888622] swapper/26      R  running task        0     0      1 
> 0x00000000
> [  821.888677] Call Trace:
> [  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> [  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> [  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> [  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> [  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> [  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> 
> This is obviously undesirable.  One way to bodge the issue would be to
> ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the
> hypercall ABI.
> 
> Instead lower the contention in the lock by doing the recalculation
> with the lock in read mode.  This is safe because only the flags/type
> are changed, there's no PTE mfn change in the AMD recalculation logic.
> The Intel (EPT) case is likely more complicated, as superpage
> splitting for diverging EMT values must be done with the p2m lock in
> taken in write mode.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> I'm unsure whether such modification is fully safe:  I think changing
> the flags/type should be fine: the PTE write is performed using
> safwrite_p2m_entry() which must be atomic (as the guest is still
> running and accessing the page tables).  I'm slightly worried about
> all PTE readers not using atomic accesses to do so (ie: pointer
> returned by p2m_find_entry() should be read atomicallly), and code
> assuming that a gfn type cannot change while holding the p2m lock in
> read mode.

Coming back to this: Yes, I think reads (at least the ones in do_recalc()
which can now be done in parallel) will need to be tightened if this is a
road we want to follow.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.