[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock
Global p2m type recalculations (as triggered by logdirty) can create so much contention on the p2m lock that simple guest operations like VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32) will cease to work in a timely manner, up to the point that Linux kernel versions that sill use the VCPU_SSHOTTMR_future flag with the singleshot timer will cease to work: [ 82.779470] CE: xen increased min_delta_ns to 1000000 nsec [ 82.793075] CE: Reprogramming failure. Giving up [ 82.779470] CE: Reprogramming failure. Giving up [ 82.821864] CE: xen increased min_delta_ns to 506250 nsec [ 82.821864] CE: xen increased min_delta_ns to 759375 nsec [ 82.821864] CE: xen increased min_delta_ns to 1000000 nsec [ 82.821864] CE: Reprogramming failure. Giving up [ 82.856256] CE: Reprogramming failure. Giving up [ 84.566279] CE: Reprogramming failure. Giving up [ 84.649493] Freezing user space processes ... [ 130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130) [ 130.604032] Task dump for CPU 14: [ 130.604032] swapper/14 R running task 0 0 1 0x00000000 [ 130.604032] Call Trace: [ 130.604032] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 [ 130.604032] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 [ 130.604032] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 [ 130.604032] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 [ 130.604032] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 [ 130.604032] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 [ 549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013) [ 549.655463] Task dump for CPU 26: [ 549.655463] swapper/26 R running task 0 0 1 0x00000000 [ 549.655463] Call Trace: [ 549.655463] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 [ 549.655463] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 [ 549.655463] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 [ 549.655463] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 [ 549.655463] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 [ 549.655463] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 [ 821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664) [ 821.888596] Task dump for CPU 26: [ 821.888622] swapper/26 R running task 0 0 1 0x00000000 [ 821.888677] Call Trace: [ 821.888712] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 [ 821.888771] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 [ 821.888818] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 [ 821.888865] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 [ 821.888917] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 [ 821.888966] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 This is obviously undesirable. One way to bodge the issue would be to ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the hypercall ABI. Instead lower the contention in the lock by doing the recalculation with the lock in read mode. This is safe because only the flags/type are changed, there's no PTE mfn change in the AMD recalculation logic. The Intel (EPT) case is likely more complicated, as superpage splitting for diverging EMT values must be done with the p2m lock in taken in write mode. Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> --- I'm unsure whether such modification is fully safe: I think changing the flags/type should be fine: the PTE write is performed using safwrite_p2m_entry() which must be atomic (as the guest is still running and accessing the page tables). I'm slightly worried about all PTE readers not using atomic accesses to do so (ie: pointer returned by p2m_find_entry() should be read atomicallly), and code assuming that a gfn type cannot change while holding the p2m lock in read mode. Wanted to post early in case someone knows any showstoppers with this approach that make it a no-go, before I try to further evaluate users. --- xen/arch/x86/mm/p2m-pt.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index cd1af33b67..f145647f01 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -486,9 +486,6 @@ static int cf_check do_recalc(struct p2m_domain *p2m, unsigned long gfn) p2m_type_t ot, nt; unsigned long mask = ~0UL << (level * PAGETABLE_ORDER); - if ( !valid_recalc(l1, e) ) - P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n", - p2m->domain->domain_id, gfn, level); ot = p2m_flags_to_type(l1e_get_flags(e)); nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask); if ( nt != ot ) @@ -538,9 +535,9 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa) */ ASSERT(!altp2m_active(current->domain)); - p2m_lock(p2m); + p2m_read_lock(p2m); rc = do_recalc(p2m, PFN_DOWN(gpa)); - p2m_unlock(p2m); + p2m_read_unlock(p2m); return rc; } -- 2.40.0
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |