[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HPET stack overflow, and general problems with do_IRQ()
On 16/08/2013 08:53, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >>>> On 15.08.13 at 22:21, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> Hello, >> >> I have finally managed to get a full stack dump from affected hardware. >> >> The logs can be found here (including hypervisor with debugging symbols): >> >> http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz >> >> The interesting log file is xen.pcpu0.stack.log >> >> By my count (grepping for e008 as CS), there are are 8 exception frames >> on the Xen stack (all stack page 6) >> >> However, because of the early ack() at the LAPIC, and disabling of >> interrupts, the vectors (in order of interrupts arriving) are >> >> c1, 99, b1, b9, a9, a1, 91, 89 > > So these are all HPET interrupts as it seems to me. You said the > box just has 8 of them, so the fundamental problem is not the > general handling of interrupts that you talk about below, but the > fact that _all_ these channels are bound to CPU0: That's an > insane side effect of the way channel management works when > there are (potentially) more CPUs than channels. So _I_ think > this is what needs fixing. > > That's even more so that the above sequence would be impossible > for guest interrupts (which don't get EOI-ed immediately, and > interrupts don't get re-enabled on that path either). Hence in the > discussion here we need to only be concerned of interrupts that > Xen uses for itself: timer, console, iommu, and HPET. Out of these, > timer and console - going through the IO-APIC - are safe from this > because of how io_apic.c implements the ->ack()/->end() pairs. > Both IOMMU implementations ack their IRQs in the LAPIC only in > ->end(). And that's what I suggested to switch HPET to too. And > other than I said about this earlier, disabling interrupts in the > ->end() handler isn't even necessary, as it already gets called with > them disabled. > > So we have two possible fixes to the HPET, either of which is > very likely to deal with the problem on its own. Additionally, with per-vcpu stacks we could have a larger per-cpu irq stack. It would be easier to grow that without 'wasting' memory. Although I think Jan's arguments above do make sense. -- Keir > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |