[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH] x86/p2m-pt: do type recalculations with p2m read lock
On 03.04.2023 12:14, Roger Pau Monne wrote: > Global p2m type recalculations (as triggered by logdirty) can create > so much contention on the p2m lock that simple guest operations like > VCPUOP_set_singleshot_timer on guests with a high amount of vCPUs (32) > will cease to work in a timely manner, up to the point that Linux > kernel versions that sill use the VCPU_SSHOTTMR_future flag with the > singleshot timer will cease to work: > > [ 82.779470] CE: xen increased min_delta_ns to 1000000 nsec > [ 82.793075] CE: Reprogramming failure. Giving up > [ 82.779470] CE: Reprogramming failure. Giving up > [ 82.821864] CE: xen increased min_delta_ns to 506250 nsec > [ 82.821864] CE: xen increased min_delta_ns to 759375 nsec > [ 82.821864] CE: xen increased min_delta_ns to 1000000 nsec > [ 82.821864] CE: Reprogramming failure. Giving up > [ 82.856256] CE: Reprogramming failure. Giving up > [ 84.566279] CE: Reprogramming failure. Giving up > [ 84.649493] Freezing user space processes ... > [ 130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected > by 10, t=60002 jiffies, g=4006, c=4005, q=14130) > [ 130.604032] Task dump for CPU 14: > [ 130.604032] swapper/14 R running task 0 0 1 > 0x00000000 > [ 130.604032] Call Trace: > [ 130.604032] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 > [ 130.604032] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > [ 130.604032] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > [ 130.604032] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > [ 130.604032] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > [ 130.604032] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > [ 549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected > by 24, t=60002 jiffies, g=6922, c=6921, q=7013) > [ 549.655463] Task dump for CPU 26: > [ 549.655463] swapper/26 R running task 0 0 1 > 0x00000000 > [ 549.655463] Call Trace: > [ 549.655463] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 > [ 549.655463] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > [ 549.655463] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > [ 549.655463] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > [ 549.655463] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > [ 549.655463] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > [ 821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected > by 24, t=60002 jiffies, g=8499, c=8498, q=7664) > [ 821.888596] Task dump for CPU 26: > [ 821.888622] swapper/26 R running task 0 0 1 > 0x00000000 > [ 821.888677] Call Trace: > [ 821.888712] [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0 > [ 821.888771] [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0 > [ 821.888818] [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0 > [ 821.888865] [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0 > [ 821.888917] [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270 > [ 821.888966] [<ffffffff900000d5>] ? start_cpu+0x5/0x14 > > This is obviously undesirable. One way to bodge the issue would be to > ignore VCPU_SSHOTTMR_future, but that's a deliberate breakage of the > hypercall ABI. > > Instead lower the contention in the lock by doing the recalculation > with the lock in read mode. This is safe because only the flags/type > are changed, there's no PTE mfn change in the AMD recalculation logic. > The Intel (EPT) case is likely more complicated, as superpage > splitting for diverging EMT values must be done with the p2m lock in > taken in write mode. > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > I'm unsure whether such modification is fully safe: I think changing > the flags/type should be fine: the PTE write is performed using > safwrite_p2m_entry() which must be atomic (as the guest is still > running and accessing the page tables). I'm slightly worried about > all PTE readers not using atomic accesses to do so (ie: pointer > returned by p2m_find_entry() should be read atomicallly), and code > assuming that a gfn type cannot change while holding the p2m lock in > read mode. Coming back to this: Yes, I think reads (at least the ones in do_recalc() which can now be done in parallel) will need to be tightened if this is a road we want to follow. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |