[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] amd iommu: Dump flags of IO page faults (off topic - pci devices)
On 07/09/12 08:32, Sander Eikelenboom wrote: > Thursday, September 6, 2012, 5:03:05 PM, you wrote: > >> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>> >>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>> >>>>>> Hi Jan, >>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA >>>>>> fault. >>>>>> Thanks, >>>>>> Wei >>>>>> signed-off-by: Wei Wang<wei.wang2@xxxxxxx> >>>>> >>>>> I have applied the patch and the flags seem to differ between the faults: >>>>> >>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = >>>>> 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>> = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id >>>>> = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id >>>>> = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>> OK, so they are not interrupt requests. I guess further information from >>>> your system would be helpful to debug this issue: >>>> 1) xl info >>>> 2) xl list >>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>> dom14 is not a HVM guest,it's a PV guest. >> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >> as io page tables. So no-sharept option does not work in this case. PV >> guests always use separated io page tables. There might be some >> incorrect mappings on the page tables. I will check this on my side. > I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept > everything else the same. > I haven't seen any IO PAGE FAULTS after that. > > I did spot some differences in the output from lspci between xen 4.1 and 4.2, > related to MSI enabled or not for the IOMMU device. > Have attached the xl/xm dmesg and lspci from booting with both versions. > > lspci: > > 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O > Memory Management Unit (IOMMU) [1002:5a23] > Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit > (IOMMU) [1002:5a23] > Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 10 > Capabilities: [40] Secure device <?> > 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee0100c Data: 4128 > Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ > > Although it seems enabled, shouldn't the IRQ number used be much higher than > 10 for MSI interrupts ? For compatibility reasons, all real PCI devices have to have the ability to fall back to legacy line level interrupts. This is the IRQ10 which you see, which is #INTA in perhaps more recognizable notation. The line interrupt(s) will only be used if MSI and MSI-x interrupts are disabled. You should find that all devices in lspci show between 1 and 4 #INTs (a thru d), with the exception of SRIOV virtual function which are specified to only support MSI/MSI-x ~Andrew > > There is another difference in the bridge device that's in front of the > 0a:00.6 device that faults before the kernel is even booted. > > 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI > express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR+ FastB2B- DisINTx+ > 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort+ <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 > I/O behind bridge: 0000f000-00000fff > Memory behind bridge: f9f00000-f9ffffff > Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff > 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- <SERR- <PERR- > 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ > <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, > L1 <1us > ExtTag+ RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- > Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- > TransPend- > LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency > L0 <1us, L1 <8us > ClockPM- Surprise- LLActRep+ BwNot+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- > CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- > Surprise- > Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- > HPIrq- LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, Power- > Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ > Interlock- > Changed: MRL- PresDet+ LinkState+ > > serveerstertje:~# lspci -t > -[0000:00]-+-00.0 > +-00.2 > +-02.0-[0b]----00.0 > +-03.0-[0a]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-05.0-[09]----00.0 > +-06.0-[08]----00.0 > +-0a.0-[07]----00.0 > +-0b.0-[06]--+-00.0 > | \-00.1 > +-0c.0-[05]----00.0 > +-0d.0-[04]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-11.0 > +-12.0 > +-12.2 > +-13.0 > +-13.2 > +-14.0 > +-14.3 > +-14.4-[03]----06.0 > +-14.5 > +-15.0-[02]-- > +-16.0 > +-16.2 > +-18.0 > +-18.1 > +-18.2 > +-18.3 > \-18.4 > > > > > >> Thanks, >> Wei >>> I will try to make a complete package, and try with one pv domain only >>> where the devices are being passed through just to simplify the setup. >>> >>> >>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>> happened. Did it stop working? >>> Yes it stops working, the video capture just freezes, but the driver >>> doesn't bail out. >>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in >>> the guest. >>> >>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>> my RD890 system >>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>> apic=debug iommu=on,verbose,debug,no-sharept >>>> * so, what OEM board you have?) >>> MSI 890FXA-GD70 >>> >>>> Also from your log, these lines looks very strange: >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8300 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8340 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8380 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f83c0 >>>> * they are just followed by the IO PAGE fault. Do you know where are >>>> they from? Your video card driver maybe? >>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur >>> without this domain being started. >>> >>> >>>> Thanks, >>>> Wei >>> >>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>> >>>>> Thx >>>>> >>>>> -- >>>>> Sander >>> >>> >>> >>> > -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |