[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Strange PVM spinlock case revisited



On Tue, 2013-02-12 at 13:42 +0000, Stefan Bader wrote:
> On 11.02.2013 18:29, Ian Campbell wrote:
> > On Fri, 2013-02-08 at 15:33 +0000, Stefan Bader wrote:
> >> A while ago I had been reporting an issue which causes Xen PVM guests
> >> to hang (apparently related to spinlocks). Since then I was able to
> >> gather a few more facts which I try to provide below. For the real
> >> reasons, I am still puzzled and would be thankful for any hints or
> >> direction.
> >>
> >> - Happens with Xen 4.1.2 and Xen 4.2 host-side.
> >> - PVM guest with 8 VCPUs is running 800 threads doing DB updates.
> > 
> > This is on a host with >= 8 PCPUs, correct?
> 
> Maybe >=4 but I cannot remember for sure anymore.

OK, so the important point I was getting at was whether the guest was
overcommitting the host or not, it seems like it is (2xVCPUs for each
PCPU)


> >> From the static information I can extract from a dump it is hard to
> >> tell what exactly is going on. VCPU2 seems to be doing something but
> >> looks to be rather deep in event handling. The irq_count is at 2, but
> >> irq_count seems to be used in a few more places than just xen_hypervisor
> >> callback. Though that at least seems to be the entry on the stack I can
> >> see for VCPU2. The contents of the per-cpu variables for irq_reqs and
> >> irq_stack_ptr at least seem consistent as if a callback came in while
> >> being on the sched_op hypercall. But then the bt command seems to be
> >> operating on a completely different stack.
> >> VCPU2 seems to be active, not polling (event processing could be initiated
> >> from enable_irqs via force callback or before finally exiting the sched_op
> >> hypercall), there seems to be a pending upcall but upcalls are masked.
> >> The next thing I would try is to instrument the sites incrementing and
> >> decrementing irq_count...
> >> ---
> >>
> >> Below some of the info seen from dom0 debugging keys and looking
> >> at a dump with crash:
> >>
> >> Dom0 Info:
> >>
> >> Polling vCPUs: {1,5-7}
> >> VCPU0: CPU0 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
> >> VCPU1: CPU1 [has=F] poll=10 upcall_pend = 00, upcall_mask = 00
> >> VCPU2: CPU6 [has=T] poll=0  upcall_pend = 01, upcall_mask = 01
> >> VCPU3: CPU2 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
> >> VCPU4: CPU4 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
> >> VCPU5: CPU3 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
> >> VCPU6: CPU7 [has=F] poll=40 upcall_pend = 01, upcall_mask = 01
> >> VCPU7: CPU4 [has=F] poll=46 upcall_pend = 01, upcall_mask = 01
> >>
> >> 10 [0/1]: s=6 n=1 x=0
> >> 40 [0/1]: s=6 n=6 x=0
> >> 46 [0/1]: s=6 n=7 x=0
> >>
> >> Guest Info:
> >>
> >> lock_spinners:
> >> [0] (struct xen_spinlock *) 0x
> >> [1] (struct xen_spinlock *) 0xffff8803bffc8ae8 = { lock=1, spinners=2 }
> >> [2] (struct xen_spinlock *) 0xffff8803bfc93700 = { lock=0, spinners=2 }
> > 
> > If I understand correctly this means that 2 threads (VCPU2 & 6, I
> > suppose) are spinning on this lock, but no one is holding it?
> 
> To my reading both went into xen_spinlock_slow(). VCPU2 may be either in the
> raw_local_irq_enable() before doing poll_irq or maybe in handling some event
> after receiving the wakeup irq but before returning from the hv call. So the
> VCPU that had the lock has returned it and likely the wakeup has been sent.

Makes sense, so we are in the window between one holder unlocking and
the next unlocking, which explains why noone is currently holding it
(which I previously thought odd).

> > An interesting hack^Wexperiment might be to make xen_poll_irq use a
> > timeout and see if that unwedges things -- this would help confirm that
> > the issue is on nested wakeup.
> 
> I could do that. This would at least re-activate the other waiters. In the 
> case
> shown VCPU6 and if something in the path of execution on VCPU2 deadlocks there
> this would move things ahead.

Yes, it would be interesting e.g. if only either 6 or 2 made progress
and the other remained wedged for all time.

> > I suppose xen_spin_unlock could also be instrumented to indicate who it
> > last kicked and for which lock and that might show us something?
> 
> IIRC I had done this but it did not really show much. What I have done in the
> meantime was to instrument the IRQ service functions in
> arch/x86/kernel/entry_64.S to give me a history of callbacks. This shows (in
> another experiment where it is 1,2,5,6 in the same lock and 2,5,6 still 
> polling)
> that the last events on the vcpu not polling are:
> 
> xen_do_hypervisor_callback+127
> call_softirq+29
> call_softirq+125
> call_softirq+29
> call_softirq+125
> call_softirq+29
> call_softirq+125
> xen_do_hypervisor_callback+28
> call_softirq+29
> xen_do_hypervisor_callback+28
> 
> The lower offsets are when irq_count gets incremented and the higher offsets 
> are
> when irq_count gets decremented before ending the callback. This shows that at
> least in this case there was an upcall, a softirq and another upcall being
> triggered without finishing the previous one. There was another experiment 
> where
> I saw softirq, upcall, upcall. But at least it seems that there is always a
> three level stack.
> I believe that softirq, upcall would be ok as softirq processing enables
> interrupts but is it expected to have an upcall interrupted by another one?

I'm not sure. evtchn's don't have a priority mechanism so I expect not
in general but individual interrupt handlers could well reenable
interrupts, I think (it might be one of the flags you pass when calling
request_irq?).

> > Can someone explain why the non-locked update of lock_spinners in
> > spinning_lock() is safe against xen_spin_unlock reading the field from
> > another VCPU? I suspect it isn't, so what happens if the other VCPU
> > kicks the lock which is just about to become prev? maybe this is handled
> > in xen_spin_lock_slow somehow?
> 
> Isn't that safe because lock_spinners is percpu?

xen_spin_unlock_slow() accesses the lock_spinners of remote CPUs, which
is what I think might be unsafe.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.