[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NetBSD dom0 PVH: hardware interrupts stalls
On Tue, Nov 24, 2020 at 03:42:28PM +0100, Jan Beulich wrote: > On 24.11.2020 11:05, Jan Beulich wrote: > > On 23.11.2020 18:39, Manuel Bouyer wrote: > >> On Mon, Nov 23, 2020 at 06:06:10PM +0100, Roger Pau Monné wrote: > >>> OK, I'm afraid this is likely too verbose and messes with the timings. > >>> > >>> I've been looking (again) into the code, and I found something weird > >>> that I think could be related to the issue you are seeing, but haven't > >>> managed to try to boot the NetBSD kernel provided in order to assert > >>> whether it solves the issue or not (or even whether I'm able to > >>> repro it). Would you mind giving the patch below a try? > >> > >> With this, I get the same hang but XEN outputs don't wake up the interrupt > >> any more. The NetBSD counter shows only one interrupt for ioapic2 pin 2, > >> while I would have about 8 at the time of the hang. > >> > >> So, now it looks like interrupts are blocked forever. > > > > Which may be a good thing for debugging purposes, because now we have > > a way to investigate what is actually blocking the interrupt's > > delivery without having to worry about more output screwing the > > overall picture. > > > >> At > >> http://www-soc.lip6.fr/~bouyer/xen-log5.txt > >> you'll find the output of the 'i' key. > > > > (XEN) IRQ: 34 vec:59 IO-APIC-level status=010 aff:{0}/{0-7} > > in-flight=1 d0: 34(-MM) > > > > (XEN) IRQ 34 Vec 89: > > (XEN) Apic 0x02, Pin 2: vec=59 delivery=LoPri dest=L status=1 > > polarity=1 irr=1 trig=L mask=0 dest_id:00000001 > > Since it repeats in Manuel's latest dump, perhaps the odd combination > of status=1 and irr=1 is to tell us something? It is my understanding > that irr ought to become set only when delivery-status clears. Yet I > don't know what to take from this... My reading of this is that one interrupt was accepted by the lapic (irr=1) and that there's a further interrupt pending that hasn't yet been accepted by the lapic (status=1) because it's still serving the previous one. But that's all weird because there's no matching vector in ISR, and hence the IRR bit on the IO-APIC has somehow become stale or out of sync with the lapic state? I'm also unsure about how Xen has managed to reach this state, it shouldn't be possible in the first place. I don't think I can instrument the paths further with printfs because it's likely to result in the behavior itself changing and console spamming. I could however create a static buffer to trace relevant actions and then dump all them together with the 'i' debug key output. Sorry Manuel, you seem to have hit some kind of weird bug regarding interrupt management. If you want to progress further with NetBSD PVH dom0 it's likely to work on a different box, but I would ask if you can keep the current box in order for us to continue debugging. Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |