[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]



On Wed, 2011-08-24 at 21:24 +0100, Konrad Rzeszutek Wilk wrote:
> On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote:
> > @xen-devel:
> > 
> > Does this look familiar to anyone, this is (I expect, hopefully Giuseppe
> > will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops
> > dom0 kernel based on xen.git from last summer (e73f4955a821) with more
> > recent upstream longterm kernels (up to and including 2.6.32.41) merged
> > in. While it does seem to have the switch from level to edge triggered
> > interrupt the Debian kernel doesn't appear to have the switch to fasteoi
> > for pirqs (0672fb44a111 plus a few followups) -- could that be related
> > to this? (I'm not sure if that was a cleanup or a fix)
> 
> It was a fix. We had some interrupts getting wedged - but I don't recall
> the stack exactly.

OK, sounds very much like those fixes are worth a try then. Thanks.

>  But there are some follows - like
> e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f
> 7e186bdd0098b34c69fb8067c67340ae610ea499

The list of changesets against drivers/xen/events.c which are not in the
Debian kernel which I came up with is below [0]. A small number are
false positives (Debian already got them via the longterm branches) but
most are not.

The majority look like real fixes to me either for this particular issue
or other problems. I would consider them all candidates for inclusion in
a future update of the Debian kernel.

Giuseppe, are you able to reproduce the issue you are seeing at will? If
I build a test kernel would you be able to try it? You are using a -686
kernel right (as opposed to amd64). OOI which hypervisor flavour do you
use?

> The interesting about the stack trace is that it looks similiar to:
> 
> http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979
> 
> which has some fixes https://patchwork.kernel.org/patch/1091772/
> but they may not help.

Looks like it is an issue on native too. If it is an issue as far back
as 2.6.32 as well I expect we'll see the fix via the longterm channels
at some point.

Ian.

[0]

652c98bac315a2253628885f05cfd5f30b553ae5 xen: Use IRQF_FORCE_RESUME
f9f09329407e3a11140827ba71d8f9d9ede42823 xen: events: do not unmask event 
channels on resume
ea2020837ca7dc2c9bcfc477fb4d261cf067db4f xen: do not try to allocate the 
callback vector again at restore time
acad13511ebe1db666aab5807117d3ac647ea58d xen: events: Remove redundant clear of 
l2i at end of round-robin loop
0e2ec1fb16f9ca84f91de3d9427a0964d679738a xen: events: Make round-robin scan 
fairer by snapshotting each l2 word
188449f889c6c30709c7e9e8710b9eff14fd963f xen: events: Clean up round-robin 
evtchn scan.
1acdebd2d67f71d230f5857c28843e636b7dd92e xen: events: Make last processed event 
channel a per-cpu variable.
2d9c33e1b47b800e43a1444a65353fcb96e27165 xen: events: Process event channels 
notifications in round-robin order.
2b1c9503c615f68262ae2e96ee26ee128b486287 xen/events: only unmask irq if enabled
c756a6e7f711308ce85afc7d4c79213cce58a033 xen: allocate irq descriptors on any 
numa node
b1a003a2aa9ee0d3d69237725c91839f4b6a8559 xen/events: use locked set|clear_bit() 
for cpu_evtchn_mask
cca68cf2d344eb3c4ff996e99f36cf8f8382bc2b xen/evtchn: clear secondary CPUs' 
cpu_evtchn_mask[] after restore
c7ff70d2824191af119091d3af8db3bb57b06f77 xen: events: do not unmask event 
channels on resume
d4283609c7504309b8b93d7582857ff4623105f3 xen: improvements to VIRQ_DEBUG output
7c42097171f2e0beafa16e007a06e464b3014bea xen: correct parameter type for 
pirq_eoi
97708051c14157e95e25d112c26902f1c6fbb462 xen: ensure that all event channels 
start off bound to VCPU 0
e05885b24a55db82fbdb5cbc3f31426b976d7fc1 xen: set up IRQ before binding virq to 
evtchn
f0d4a0552f03b52027fb2c7958a1cbbe210cf418 xen/apic: fix pirq_eoi_gmfn resume
d2ea486300ca6e207ba178a425fbd023b8621bb1 xen/pirq: use fasteoi for MSI too
158d6550716687486000a828c601706b55322ad0 xen/pirq: use eoi as enable
2390c371ecd32d9f06e22871636185382bf70ab7 xen/events: use 
PHYSDEVOP_pirq_eoi_gmfn to get pirq need-EOI info
cb23e8d58ca35b6f9e10e1ea5682bd61f2442ebd xen/evtchn: correction, pirq hypercall 
does not unmask
43d8a5030a502074f3c4aafed4d6095ebd76067c xen/evtchn: pirq_eoi does unmask
f4526f9a78ffb3d3fc9f81636c5b0357fc1beccd xen/evtchn: make pirq enable/disable 
unmask/mask
c6a16a778f86699b339585ba5b9197035d77c40f xen/evtchn: rename retrigger_dynirq -> 
irq
d0936845a856816af2af48ddf019366be68e96ba xen/evtchn: rename 
enable/disable_dynirq -> unmask/mask_irq
2789ef00cbe2cdb38deb30ee4085b88befadb1b0 xen: make pirq interrupts use fasteoi
0672fb44a111dfb6386022071725c5b15c9de584 xen/events: change to using fasteoi
9fa90aa72d6af5cc2c2eddf56f9a586035e13ae7 xen: use 
dynamic_irq_init_keep_chip_data
f55ce8740101c54016544a0d633dc1b6b21244ae Introduce CONFIG_XEN_PVHVM compile 
option
f61692642a2a2b83a52dd7e64619ba3bb29998af xen/pirq: do EOI properly for pirq 
events
47cd3eb068a8a0cea124495e525ac16876fa08f6 xen/pci: fix compile error when 
CONFIG_PCI_XEN disabled
29a2e2a7bd19233c62461b104c69233f15ce99ec xen/apic: use handle_edge_irq for pirq 
events
6dc7b8080195ed43ee6de5b1d60c65aa719208ad xen/irq: replace boot boot allocator
66fd3052fec7e7c21a9d88ba1a03bc062f5fb53d xen: handle events as edge-triggered
8401e9b96f80f9c0128e7c8fc5a01abfabbfa021 xen: use percpu interrupts for IPIs 
and VIRQs

-- 
Ian Campbell


A Fortran compiler is the hobgoblin of little minis.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.