[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble
On 28/03/15 15:34, Sander Eikelenboom wrote: > Hi Jan, > > Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: > "x86 don't change affinity with interrupt unmasked", > gives trouble on my AMD box, symptoms: > - APIC errors in xl dmesg that weren't previously there: > (XEN) [2015-03-26 20:35:37.085] IOAPIC[0]: Set PCI routing entry (6-13 -> > 0x88 -> IRQ 13 Mode:0 Active:0) > (XEN) [2015-03-26 20:35:37.101] PCI: Using MCFG for segment 0000 bus 00-ff > (XEN) [2015-03-26 20:35:37.097] IOAPIC[0]: Set PCI routing entry (6-8 -> > 0x58 -> IRQ 8 Mode:0 Active:0) > (XEN) [2015-03-26 20:35:37.112] IOAPIC[0]: Set PCI routing entry (6-18 -> > 0xb8 -> IRQ 18 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.189] IOAPIC[0]: Set PCI routing entry (6-17 -> > 0xc0 -> IRQ 17 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-29 -> > 0xc8 -> IRQ 53 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-24 -> > 0xd0 -> IRQ 48 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-30 -> > 0xd8 -> IRQ 54 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-12 -> > 0x21 -> IRQ 36 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-13 -> > 0x29 -> IRQ 37 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.421] IOAPIC[1]: Set PCI routing entry (7-16 -> > 0x31 -> IRQ 40 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.495] IOAPIC[1]: Set PCI routing entry (7-28 -> > 0x39 -> IRQ 52 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.498] IOAPIC[0]: Set PCI routing entry (6-16 -> > 0x89 -> IRQ 16 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.498] IOAPIC[1]: Set PCI routing entry (7-14 -> > 0xa9 -> IRQ 38 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:37.548] IOAPIC[0]: Set PCI routing entry (6-22 -> > 0xb9 -> IRQ 22 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:39.620] IOAPIC[1]: Set PCI routing entry (7-9 -> > 0xc1 -> IRQ 33 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:39.646] IOAPIC[1]: Set PCI routing entry (7-8 -> > 0xc9 -> IRQ 32 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:39.647] IOAPIC[1]: Set PCI routing entry (7-23 -> > 0xd1 -> IRQ 47 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:41.732] IOAPIC[1]: Set PCI routing entry (7-5 -> > 0xd9 -> IRQ 29 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:41.779] IOAPIC[1]: Set PCI routing entry (7-4 -> > 0x22 -> IRQ 28 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:41.803] mm.c:803: d0: Forcing read-only access to > MFN fed00 > (XEN) [2015-03-26 20:35:41.894] IOAPIC[0]: Set PCI routing entry (6-19 -> > 0x2a -> IRQ 19 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:42.057] IOAPIC[1]: Set PCI routing entry (7-22 -> > 0x72 -> IRQ 46 Mode:1 Active:1) > (XEN) [2015-03-26 20:35:42.093] IOAPIC[1]: Set PCI routing entry (7-27 -> > 0x8a -> IRQ 51 Mode:1 Active:1) > > these: > (XEN) [2015-03-26 20:35:42.205] APIC error on CPU0: 00(40) > (XEN) [2015-03-26 20:35:42.372] APIC error on CPU0: 40(40) > > (XEN) [2015-03-26 20:35:42.691] d0 attempted to change d0v1's CR4 flags > 00000660 -> 00000760 > (XEN) [2015-03-26 20:35:42.691] IOAPIC[1]: Set PCI routing entry (7-1 -> > 0x9a -> IRQ 25 Mode:1 Active:1) > > and this one: > (XEN) [2015-03-26 20:35:42.707] APIC error on CPU0: 40(40) > (XEN) [2015-03-26 20:35:43.958] d0 attempted to change d0v0's CR4 flags > 00000660 -> 00000760 > (XEN) [2015-03-26 20:35:43.970] d0 attempted to change d0v2's CR4 flags > 00000660 -> 00000760 > (XEN) [2015-03-26 20:35:43.988] d0 attempted to change d0v3's CR4 flags > 00000660 -> 00000760 > (XEN) [2015-03-26 20:35:43.992] d0 attempted to change d0v4's CR4 flags > 00000660 -> 00000760 > (XEN) [2015-03-26 20:35:43.996] d0 attempted to change d0v5's CR4 flags > 00000660 -> 00000760 > (d1) [2015-03-26 20:40:42.220] mapping kernel into physical memory > (d1) [2015-03-26 20:40:42.220] about to get started... > > > - random failures on dom0 SATA devices, the SATA controller is using multiple > MSI > interrupts. > > - failues on XHCI controllers passed through to a HVM guest which uses MSI-X > interrupts. Leading to these in the guest dmesg: > [ 350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma > 000000003cdf7140 trb-start 000000003cdf7240 trb-end 000000003cdf7240 > seg-start 000000003cdf7000 seg-end 000000003cdf73f0 > [ 350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr not > part of current TD ep_index 1 comp_code 1 > [ 350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma > 000000003cdf7150 trb-start 000000003cdf7240 trb-end 000000003cdf7240 > seg-start 000000003cdf7000 seg-end 000000003cdf73f0 > [ 350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr not > part of current TD ep_index 1 comp_code 1 > > > Reverting this specific commit makes all the troubles go away .. That is unfortunate, as conceptually the identified patch definitely fixes a bug. The "APIC error" messages have bit 6 set, which is "Receive Illegal Vector". i.e. a device has attempted to deliver an interrupt with a vector field less than 16. I presume that this means that the device is ending up with a malformed data field programmed into it. Can you identify the PCI sbdf's of the problematic devices, and collect debug-keys Q, M and i on a working system so I can identify precisely which of the MSI interrupt drivers is in use (Xen has several, depending on exact hardware circumstance). If you can, the same debug-keys with the problematic changeset present might also be interesting. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |