[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NetBSD dom0 PVH: hardware interrupts stalls
On Tue, Nov 24, 2020 at 11:05:12AM +0100, Jan Beulich wrote: > On 23.11.2020 18:39, Manuel Bouyer wrote: > > On Mon, Nov 23, 2020 at 06:06:10PM +0100, Roger Pau Monné wrote: > >> OK, I'm afraid this is likely too verbose and messes with the timings. > >> > >> I've been looking (again) into the code, and I found something weird > >> that I think could be related to the issue you are seeing, but haven't > >> managed to try to boot the NetBSD kernel provided in order to assert > >> whether it solves the issue or not (or even whether I'm able to > >> repro it). Would you mind giving the patch below a try? > > > > With this, I get the same hang but XEN outputs don't wake up the interrupt > > any more. The NetBSD counter shows only one interrupt for ioapic2 pin 2, > > while I would have about 8 at the time of the hang. > > > > So, now it looks like interrupts are blocked forever. > > Which may be a good thing for debugging purposes, because now we have > a way to investigate what is actually blocking the interrupt's > delivery without having to worry about more output screwing the > overall picture. > > > At > > http://www-soc.lip6.fr/~bouyer/xen-log5.txt > > you'll find the output of the 'i' key. > > (XEN) IRQ: 34 vec:59 IO-APIC-level status=010 aff:{0}/{0-7} in-flight=1 > d0: 34(-MM) > > (XEN) IRQ 34 Vec 89: > (XEN) Apic 0x02, Pin 2: vec=59 delivery=LoPri dest=L status=1 > polarity=1 irr=1 trig=L mask=0 dest_id:00000001 > > (XEN) ioapic 2 pin 2 gsi 34 vector 0x67 > (XEN) delivery mode 0 dest mode 0 delivery status 0 > (XEN) polarity 1 IRR 0 trig mode 1 mask 0 dest id 0 > > IOW from guest pov the interrupt is entirely idle (mask and irr clear), > while Xen sees it as both in-flight and irr also already having become > set again. I continue to suspect the EOI timer not doing its job. Yet > as said before, for it to have to do anything in the first place the > "guest" (really Dom0 here) would need to fail to EOI the IRQ within > the timeout period. Which in turn, given your description of how you > handle interrupts, cannot be excluded (i.e. the handling may simply > take "slightly" too long). I've tried to force some of those scenarios myself by modifying the code, and didn't seem to be able to trigger the same scenario. I guess the NetBSD case is slightly difficult to recreate. > What we're missing is LAPIC information, since the masked status logged > is unclear: (-MM) isn't fully matching up with "mask=0". But of course > the former is just a software representation, while the latter is what > the RTE holds. IOW for the interrupt to not get delivered, there needs > to be this or a higher ISR bit set (considering we don't use the TPR), > or (I think we can pretty much exclude this) we'd need to be running > with IRQs off for extended periods of time. Let's dump the physical lapic(s) IRR and ISR together with the IO-APIC state. Can you please apply the following patch and use the 'i' key again? (please keep the previous patch applied) Thanks, Roger. ---8<--- diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index 60627fd6e6..c33d682b69 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1547,3 +1547,24 @@ void check_for_unexpected_msi(unsigned int vector) { BUG_ON(apic_isr_read(vector)); } + +static DEFINE_SPINLOCK(dump_lock); +void dump_lapic(void *unused) +{ + unsigned int i; + unsigned long flags; + + spin_lock_irqsave(&dump_lock, flags); + printk("CPU %u APIC ID %u\n", smp_processor_id(), apic_read(APIC_ID)); + + printk("IRR "); + for ( i = APIC_ISR_NR - 1; i-- > 0; ) + printk("%08x", apic_read(APIC_ISR + i*0x10)); + + printk("\nISR "); + for ( i = APIC_ISR_NR - 1; i-- > 0; ) + printk("%08x", apic_read(APIC_IRR + i*0x10)); + printk("\n"); + + spin_unlock_irqrestore(&dump_lock, flags); +} diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index e66fa99ec7..92edb3000a 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -2470,6 +2470,7 @@ static const char * delivery_mode_2_str( } } +void dump_lapic(void *unused); void dump_ioapic_irq_info(void) { struct irq_pin_list *entry; @@ -2516,6 +2517,9 @@ void dump_ioapic_irq_info(void) entry = &irq_2_pin[entry->next]; } } + + dump_lapic(NULL); + smp_call_function(dump_lapic, NULL, true); } static unsigned int __initdata max_gsi_irqs;
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |