[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load



On 17.10.2012 15:28, Andrew Cooper wrote:
> On 17/10/12 14:10, Stefan Bader wrote:
>> I am currently looking at a bug report[1] which is happening when
>> a Xen PVM guest with multiple VCPUs is running a high IO database
>> load (a test script is available in the bug report).
>>
>> In experimenting it seems that this happens (or is getting more
>> likely) when the number of VCPUs is 8 or higher (though I have
>> not tried 6, only 2 and 4), having autogroup enabled seems to
>> make it more likely, too (at some point thought it would actually
>> prevent it but we were wrong) and pv spinlocks enabled.
>> It has happened with older (3.4.3) and newer (4.1.2) versions of
>> Xen as a host and with 3.2 and 3.5 kernels as guests.
>>
>> The pv spinlock assumption I will try to get re-verified by asking
>> to reproduce under a real load with a kernel that just disables
>> that. However, the dumps I am looking at really do look odd there.
>>
>> The first dump I looked at had the spinlock of runqueue[0] being
>> placed into the per-cpu lock_spinners variable for cpu#0 and
>> cpu#7 (doing my tests with 8 VCPUs). So apparently both cpus were
>> waiting on the slow path for it to become free. Though actually
>> it was free! Now, here is one issue I have in understanding the
>> dump: the back traces produced in crash are in the normal form
>> not showing any cpu in the poll_irq HV call. Only when using
>> the form that uses the last known stack location and displays
>> all text symols found will get close for cpu#7. cpu#0 still does
>> not seem to be anywhere close. This could be a problem with crash,
>> or with the way PVM works, I am not sure.
>>
>> Anyway, from what I could take from that situation, it seemed that
>> cpu#1 (that one had soft lockup warnings in the log) seemed to try
>> to do a wake_up on the task that just seemed to have done an io
>> schedule on cpu#7 (but the waitqueue_head spinlock is currently
>> locked). cpu#7 tries to get the runqueue[0] spinlock to do an idle
>> migration (though the lock currently is free). So I guessed that
>> maybe cpu#0 was just woken for the free lock but has not grabbed
>> it yet.
>>
>> From there I wondered whether something that acquires a spinlock
>> usually more than the quick path timeout (and this causes new
>> lockers to go into the slow path), could cause a stall on a high
>> numbered cpu when the lock is contented (as the search and
>> signalling is only done for the first waiter starting from 0).
>> That lead to below patch. The good thing about it, it does not
>> break things immediately, the bad thing, it does not help with
>> the problem. :/
>>
>> The interesting thing when looking at a second dump, taken with
>> a testkernel using the patch below, was that again 2 cpus seemed
>> to spin slow on a lock that was free in the dump. And again at
>> least one did not seem to be doing anything spinlock related
>> (anymore?).
>> One other detail (but that could be just incidence as well) was
>> that in unmodified kernel I usually would end up with soft
>> lockup messages, with the patched kernel, that was a stall
>> detected by rcu_bh (0 and 1 stalled, detected by 3).
>>
>> Now I am a bit confused and wonder about some basic things:
>> 1. When a cpu goes into the slow lock path and into the poll_irq,
>>    shouldn't I expect this one to be in hypercall_page in the
>>    dump?
>> 2. When does the whole handling for interrupted spinlock wait
>>    apply? I assume only for a spinlock taken without irqs
>>    disabled and then trying to acquire another one in an
>>    interrupt handler. Is that correct?
>> 3. Not really related but I just stumbled over it:
>>    In xen_spin_trylock: "asm("xchgb %b0,%1"...)
>>    What does the "b" part of %b0 do? I thought xchgb already
>>    would say it is a byte value...
>>
>> But putting aside those questions, the fact that two times
>> there was two cpus waiting on the same lock which from the
>> lock count (=0) was free seems a bit too much of a coincidence.
>> Oh and the spinners count in the lock was 2 as one would
>> expect.
>>
>> struct rq {
>>   lock = {
>>     raw_lock = {
>>       {
>>         head_tail = 512, 
>>         tickets = {
>>           head = 0 '\000', 
>>           tail = 2 '\002'
>>         }
>>       }
>>     }
>>   }, 
>>   nr_running = 1,
>>   ...
>> }
>>
>> I really don't know how this happens. Especially cpu#0 seems at
>> least past the wakeup and should have removed itself from the
>> spinners list...
>>  
>> -Stefan
> 
> We (Citrix) have an outstanding bug, on 2.6.32 classic, which we have
> never managed to get to the bottom of (because of inability to reliably
> reproduce), which might be related.
> 
> In our case, certain processes were locking up, and it turned out that
> the kernel was issuing SCHOP_poll hypercalls (same the stack trace on
> your launchpad ticket) on its own spinlock IPI event channel
> (understandable, as its a spinlock), but with the event channel masked,
> so it would never wake up from the poll.

Unfortunately either crash looking at it or something else is wrong for the PV
guest as looking at HYPERVISOR_shared_info appears completely blank.

> 
> Can you reliably reproduce the issue? and if so, would it be possible to
> capture hypervisor debug keys q and e when the badness happens?  This
> would clearly show if our bugs are related or not

Yes, as mentioned the python script in the bug report does that when meeting the
other conditions. I certainly can do the debug keys on that.

> 
> Our repro case took about a month to reproduce, and unfortunately
> turning lockdep on has appeared to fix the problem.

Yeah, having lockdep enabled was another thing that made it less likely. Though
some reporters had lockups with it, too. Just much less likely.

-Stefan

> 
> ~Andrew
> 
>>
>> [1] http://bugs.launchpad.net/bugs/1011792
>>


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.