Xen project Mailing List

Re: [Xen-devel] [PATCH] xen: Send spinlock IPI to all waiters

To: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Fri, 15 Feb 2013 10:05:18 -0500

Cc: Jan Beulich <JBeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Fri, 15 Feb 2013 15:06:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Feb 15, 2013 at 11:52:35AM +0100, Stefan Bader wrote: > Hopefully not mis-parsing Jan's last comments on the other thread, > this would be the fix covering things until a better implementation > is done. > This also prevents the hang on older kernels, where it could be re- > produced reliably. > > -Stefan > > >From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001 > From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> > Date: Fri, 15 Feb 2013 09:48:52 +0100 > Subject: [PATCH] xen: Send spinlock IPI to all waiters > > There is a loophole between Xen's current implementation of > pv-spinlocks and the scheduler. This was triggerable through > a testcase until v3.6 changed the TLB flushing code. The > problem potentially is still there just not observable in the > same way. > > What could happen was (is): > > 1. CPU n tries to schedule task x away and goes into a slow > wait for the runq lock of CPU n-# (must be one with a lower > number). > 2. CPU n-#, while processing softirqs, tries to balance domains > and goes into a slow wait for its own runq lock (for updating > some records). Since this is a spin_lock_irqsave in softirq > context, interrupts will be re-enabled for the duration of > the poll_irq hypercall used by Xen. > 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives > an interrupt (e.g. endio) and when processing the interrupt, > tries to wake up task x. But that is in schedule and still > on_cpu, so try_to_wake_up goes into a tight loop. > 4. The runq lock of CPU n-# gets unlocked, but the message only > gets sent to the first waiter, which is CPU n-# and that is > busily stuck. Just for completness: 5. The 3) (so CPU n-1) sits in its tight loop and never exits as nothing ever interrupted it. > > To avoid this and since the unlocking code has no real sense of > which waiter is best suited to grab the lock, just send the IPI > to all of them. This causes the waiters to return from the hyper- > call (those not interrupted at least) and do active spinlocking. > > BugLink: http://bugs.launchpad.net/bugs/1011792 > > Signed-off-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > arch/x86/xen/spinlock.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c > index 83e866d..f7a080e 100644 > --- a/arch/x86/xen/spinlock.c > +++ b/arch/x86/xen/spinlock.c > @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct > xen_spinlock *xl) > if (per_cpu(lock_spinners, cpu) == xl) { > ADD_STATS(released_slow_kicked, 1); > xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); > - break; > } > } > } > -- > 1.7.9.5 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.