[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] amd iommu: Dump flags of IO page faults



Thursday, September 6, 2012, 5:03:05 PM, you wrote:

> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote:
>>
>> Thursday, September 6, 2012, 3:32:51 PM, you wrote:
>>
>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote:
>>>>
>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote:
>>>>
>>>>> Hi Jan,
>>>>> Attached patch dumps io page fault flags. The flags show the reason of
>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA 
>>>>> fault.
>>>>
>>>>> Thanks,
>>>>> Wei
>>>>
>>>>> signed-off-by: Wei Wang<wei.wang2@xxxxxxx>
>>>>
>>>>
>>>> I have applied the patch and the flags seem to differ between the faults:
>>>>
>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 
>>>> 0xc2c2c2c0, flags = 0x000
>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 
>>>> 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000
>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id 
>>>> = 0x0700, fault address = 0xa8d339e0, flags = 0x020
>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id 
>>>> = 0x0700, fault address = 0xa8d33a40, flags = 0x020
>>
>>> OK, so they are not interrupt requests. I guess further information from
>>> your system would be helpful to debug this issue:
>>> 1) xl info
>>> 2) xl list
>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest)
>>> 4) cat /proc/iomem (in both dom0 and your hvm guest)
>>
>> dom14 is not a HVM guest,it's a PV guest.

> Ah, I see. PV guest is quite different than hvm, it does use p2m tables 
> as io page tables. So no-sharept option does not work in this case. PV 
> guests always use separated io page tables. There might be some 
> incorrect mappings on the page tables. I will check this on my side.

I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept 
everything else the same.
I haven't seen any IO PAGE FAULTS after that.

I did spot some differences in the output from lspci between xen 4.1 and 4.2, 
related to MSI enabled or not for the IOMMU device.
Have attached the xl/xm dmesg and lspci from booting with both versions.

lspci:

00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory 
Management Unit (IOMMU) [1002:5a23]
        Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit 
(IOMMU) [1002:5a23]
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 10
        Capabilities: [40] Secure device <?>
4.1:    Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+
4.2:    Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0100c  Data: 4128
        Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+

Although it seems enabled, shouldn't the IRQ number used be much higher than 10 
for MSI interrupts ?

There is another difference in the bridge device that's in front of the  
0a:00.6 device that faults before the kernel is even booted.

00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI 
express gpp port C) [1002:5a17] (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+
4.1:    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
4.2:    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: f9f00000-f9ffffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
4.1:    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- <SERR- <PERR-
4.2:    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- 
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, 
L1 <1us
                        ExtTag+ RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 
<1us, L1 <8us
                        ClockPM- Surprise- LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
                        Slot #3, PowerLimit 10.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- 
Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
                        Changed: MRL- PresDet+ LinkState+

serveerstertje:~# lspci -t
-[0000:00]-+-00.0
           +-00.2
           +-02.0-[0b]----00.0
           +-03.0-[0a]--+-00.0
           |            +-00.1
           |            +-00.2
           |            +-00.3
           |            +-00.4
           |            +-00.5
           |            +-00.6
           |            \-00.7
           +-05.0-[09]----00.0
           +-06.0-[08]----00.0
           +-0a.0-[07]----00.0
           +-0b.0-[06]--+-00.0
           |            \-00.1
           +-0c.0-[05]----00.0
           +-0d.0-[04]--+-00.0
           |            +-00.1
           |            +-00.2
           |            +-00.3
           |            +-00.4
           |            +-00.5
           |            +-00.6
           |            \-00.7
           +-11.0
           +-12.0
           +-12.2
           +-13.0
           +-13.2
           +-14.0
           +-14.3
           +-14.4-[03]----06.0
           +-14.5
           +-15.0-[02]--
           +-16.0
           +-16.2
           +-18.0
           +-18.1
           +-18.2
           +-18.3
           \-18.4





> Thanks,
> Wei

>> I will try to make a complete package, and try with one pv domain only where 
>> the devices are being passed through just to simplify the setup.
>>
>>
>>> * I would also like to know the symptoms of device 0x0700 when IO_PF
>>> happened. Did it stop working?
>>
>> Yes it stops working, the video capture just freezes, but the driver doesn't 
>> bail out.
>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in 
>> the guest.
>>
>>> (BTW: I copied a few options from your boot cmd line and it worked with
>>> my RD890 system
>>
>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps
>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug
>>> apic=debug iommu=on,verbose,debug,no-sharept
>>
>>> * so, what OEM board you have?)
>>
>> MSI 890FXA-GD70
>>
>>> Also from your log, these lines looks very strange:
>>
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xd5, mfn=0xa4a11
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xdd, mfn=0xa4a09
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xdf, mfn=0xa4a07
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xe1, mfn=0xa4a05
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xe3, mfn=0xa4a03
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xe5, mfn=0xa4a01
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xe7, mfn=0xa463f
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xe9, mfn=0xa463d
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xeb, mfn=0xa463b
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xed, mfn=0xa4639
>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>> read-only memory page. gfn=0xef, mfn=0xa4637
>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id
>>> = 0x0a06, fault address = 0xc2c2c2c0
>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>> id = 0x0700, fault address = 0xa90f8300
>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>> id = 0x0700, fault address = 0xa90f8340
>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>> id = 0x0700, fault address = 0xa90f8380
>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>> id = 0x0700, fault address = 0xa90f83c0
>>
>>> * they are just followed by the IO PAGE fault. Do you know where are
>>> they from? Your video card driver maybe?
>>
>>  From a HVM domain with a old (3.0.3) kernel, but the faults also occur 
>> without this domain being started.
>>
>>
>>> Thanks,
>>> Wei
>>
>>
>>>> Complete xl dmesg and lspci -vvvknn attached.
>>>>
>>>> Thx
>>>>
>>>> --
>>>> Sander
>>
>>
>>
>>
>>


Attachment: lspci-4.1.txt
Description: Text document

Attachment: lspci-4.2.txt
Description: Text document

Attachment: xl-dmesg-4.2.txt
Description: Text document

Attachment: xm-dmesg-4.1.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.