[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On 22/10/2018 08:33, Chao Gao wrote: > On Mon, Oct 15, 2018 at 01:06:12PM +0100, Andrew Cooper wrote: >> On 15/10/18 11:30, Roger Pau Monné wrote: >>> Hello, >>> >>> Wei recently discovered an issue when running a Linux PVH Dom0 on a >>> box with a Intel Family 6 (0x6), Model 158 (0x9e), Stepping 9 (raw >>> 000906e9) CPU, we are not sure whether the issue is limited to a PVH >>> Dom0, or it just happens to be easier to trigger in this scenario. >> This issue has been seen very occasionally for years. My debugging >> patch dates back to 2013, and it has been observed on Haswell systems as >> well. There have also been a handful of reports on xen-devel over the >> years. >> >> Wei is the first person to get a reliable enough repro to debug. It is >> not exclusive to PVH Dom0, but that appears to be the easiest way to >> tickle the problem. >> >>> The issue is caused by what seems to be an interrupt injection while >>> Xen is still servicing a previous interrupt (ie: the interrupt hasn't >>> been EOI'ed and ISR for the vector is set) with the same or lower >>> priority than the interrupt currently being serviced. This injection >>> always happen when returning from idle from a state ACPI_STATE_C3 or >>> lower. >> As a bit of background, for some guest irqs, we need to inject the >> interrupt into the guest and wait for an explicit ack. >> >> If the irq source doesn't have a mask bit which Xen can use, the only >> option we have is to avoid repeated interruption is to leave the irq in >> service at the LAPIC. The purpose of the Pending EOI stack is to manage >> these as acks arrive back from guest context. >> >> For reasons which aren't clear, guest-bound MSI vectors which don't have >> a mask bit also use this PEOI stack mechanism. I think this is probably >> a Xen bug, but it also relevant to the issue. >> >> In Wei's case, the interrupt in question is an MSI non-maskable >> interrupt from the USB controller. >> >>> Note that I haven't been able to reproduce this issue when using >>> mwait-idle=0 or max_cstate=2 on the Xen command line, but again >>> without knowing the underlying issue it's impossible to tell whether >>> it's relevant. >>> >>> Andrew provided a debug patch which I've expanded to also log power >>> state transition, and is attached to this email. >>> >>> Here is a trace of a crash, together with the debug info. >>> >>> (XEN) *** Pending EOI error *** >>> (XEN) cpu #1, irq 30, vector 0x21, sp 1 >>> (XEN) Peoi stack: sp 1 >>> (XEN) [ 0] irq 30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0 >>> (XEN) Peoi stack trace records: >>> (XEN) [22619] POP {sp 1, irq 30, vec 0x21} >>> (XEN) [22620] POWER TYPE 4 >>> (XEN) [22621] IDLE PPR 0x00000010 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22622] WAKE PPR 0x00000010 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000004 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22623] ACK_PRE PPR 0x000000f0 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000004 >>> (XEN) [22624] ACK_POST PPR 0x00000010 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22625] POWER TYPE 5 >>> (XEN) [22626] IDLE PPR 0x00000010 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22627] WAKE PPR 0x00000010 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22628] PUSH {sp 0, irq 30, vec 0x21} >>> (XEN) [22629] POWER TYPE 5 >>> (XEN) [22630] IDLE PPR 0x00000020 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22631] WAKE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22632] POWER TYPE 5 >>> (XEN) [22633] IDLE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22634] WAKE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000004 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22635] ACK_PRE PPR 0x000000f0 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000004 >>> (XEN) [22636] ACK_POST PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22637] READY {sp 1, irq 30, vec 0x21} >>> (XEN) [22638] ACK_PRE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22639] ACK_POST PPR 0x00000010 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) [22640] POP {sp 1, irq 30, vec 0x21} >>> (XEN) [22641] PUSH {sp 0, irq 30, vec 0x21} >>> (XEN) [22642] POWER TYPE 4 >>> (XEN) [22643] IDLE PPR 0x00000020 >>> (XEN) IRR >>> 0000000000000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22644] WAKE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22645] POWER TYPE 3 >>> (XEN) [22646] IDLE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22647] WAKE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22648] POWER TYPE 3 >>> (XEN) [22649] IDLE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) [22650] WAKE PPR 0x00000020 >>> (XEN) IRR >>> 0000000002000000000000000000000000000000000000000000000000000000 >>> (XEN) ISR >>> 0000000002000000000000000000000000000000000000000000000000000000 >> What has happened here is that, despite vector 0x21 being in service >> (starting at the PUSH), we see it injected a second time. The ASSERT() >> fires because we find this vector still on the pending EOI stack. >> >> After that, we go idle a few times, but never haven't yet acked the >> vector (i.e. whatever we're waiting for the guest to acknowledge hasn't >> happened yet, and Xen has nothing else to do on this CPU). >> > >From the debugging, we see that PPR/IRR/ISR appear to retain their state >> across the mwait, and there is nothing in the manual which I can see >> discussing the interaction of LAPIC state and C states. >> >> However, from the behaviour seen here, we occasionally get woken from >> mwait by an interrupt which already pending. I can only conclude that >> there is some issue with priority calculations for edge triggered >> interrupts when idle, which allows another one to slip in. The fact > Hi, Roger, Andrew and Wei, > > Jan's patch > (https://lists.xen.org/archives/html/xen-devel/2018-10/msg01031.html) > fixs an issue in handling SVI. Currently, when dealing with EOI from guest, > the > SVI was cleared. But the correct way is clearing the corresponding bit in VISR > and then setting SVI to the highest index of bit set in VISR (please refer to > SDM 29.1.4). If SVI is set to a value lower than the vector of the highest > priority interrupt that is in service, the PPR virtualization (29.1.3) might > set the VPPR to a lower value on VMEntry too. Thus an interrupt with same or > lower priority, which should be blocked by VPPR, slips in. > > Could you apply Jan's patch and try to reproduce it again? Hello, I'm aware of Jan's patch, but pertains to Xen's emulation of the virtual Local APIC for a guest. This bug is with the real hardware APIC, as it pertains waking from MWAIT. At the point that things go wrong, there is no VT-x involved at all. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |