[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding showing in-flight=1, (---M)



>>> On 17.08.10 at 20:01, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 17/08/2010 18:28, "Bruce Edge" <bruce.edge@xxxxxxxxx> wrote:
> 
>> On Tue, Jun 29, 2010 at 1:42 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>>>>> On 28.06.10 at 20:22, Dante Cinco <dantecinco@xxxxxxxxx> wrote:
>>>> I have an HP Proliant DL380-G6 (dual Xeon E5540 @ 2.53GHz) with Xen 4.0.0
>>>> and dom0 Linux 2.6.32.12 x86_64 pvops and domU Linux kernel 2.6.30.1 
>>>> x86_64.
>>>> I'm using PCI passthrough (pci-stub) to pass my 4-port 8Gb PMC-Sierra Fibre
>>>> Channel HBA to domU. After running I/Os for several hours, both dom0 and
>>>> domU hangs and the Xen console shows the interrupt binding below where IRQ
>>>> 66 shows in-flight=1 and mask set (---M). What's the best way to debug this
>>>> problem?
>>> 
>>> There are potentially two problems here: One is that the guest may
>>> fail to send the EOI notification. You would want to check whether
>>> pirq_guest_eoi() got run after that last occurrence of the interrupt.
>>> 
>>> The more worrying part is that Xen should time out on a guest failing
>>> to send the EOI notification, and ack the interrupt nevertheless.
>>> Looking at the code I fail to see how the ack_APIC_irq() would get
>>> sent in this case: non-maskable MSIs get this issued from
>>> end_msi_irq(), but ->end doesn't get invoked from
>>> irq_guest_eoi_timer_fn() (only ->enable does). Keir, am I missing
>>> something?
> 
> I don't think that timer logic is designed to handle non-maskable MSIs, only
> maskable ones. It ought to be not too hard to fix it up for non-maskable
> ones too by issuing the ->end() call from the timer handler?

Yes, that was what I was trying to hint at, but I wasn't sure whether
calling ->end() here has any unintended side effects and/or requires
any extra care (like preventing a subsequent guest initiated EOI to
call ->end() again).

While looking at this I came across another thing I don't understand:
__pirq_guest_eoi(), for the ACKTYPE_EOI case, calls __set_eoi_ready()
in a cpu_test_and_clear() conditional, but __set_eoi_ready() bails
out if it finds !cpu_test_and_clear() on the same bitmap - what's the
point of calling __set_eoi_ready() here then (or what am I missing)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.