[Xen-devel] Xen-unstable + dom0 linux-4.14-rc6: bisected pci-passthrough problem to HVM

Hi Juergen and Boris,

While testing out linux 4.14-rc6 i found some trouble with one of my devices 
for which I use pci-passthrough. 
It fails to start a HVM when configured to use pci-passthrough on this 
particular device (see below for lspci output)
Using other pci devices for passthrough still works ok, it seems only this 
particular device is affected on my system.

libxl: error: libxl_qmp.c:457:qmp_next: Domain 3:Socket read error: Connection 
reset by peer
libxl: error: libxl_pci.c:1295:libxl__add_pcidevs: Domain 
3:libxl_device_pci_add failed: -3
libxl: error: libxl_create.c:1495:domcreate_attach_devices: Domain 3:unable to 
add pci devices
libxl: error: libxl_domain.c:1000:libxl__destroy_domid: Domain 3:Non-existant 
libxl: error: libxl_domain.c:959:domain_destroy_callback: Domain 3:Unable to 
destroy guest
libxl: error: libxl_domain.c:886:domain_destroy_cb: Domain 3:Destruction of 
domain failed

I bisected the dom0 kernel and found:
    ce56a86e2ade45d052b3228cdfebe913a1ae7381 is the first bad commit
    commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
    Author: Craig Bergstrom <craigb@xxxxxxxxxx>
    Date:   Thu Oct 19 13:28:56 2017 -0600

    x86/mm: Limit mmap() of /dev/mem to valid physical addresses
    Currently, it is possible to mmap() any offset from /dev/mem.  If a
    program mmaps() /dev/mem offsets outside of the addressable limits
    of a system, the page table can be corrupted by setting reserved bits.
    For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
    x86_64 system with a 48-bit bus, the page fault handler will be called
    with error_code set to RSVD.  The kernel then crashes with a page table
    corruption error.
    This change prevents this page table corruption on x86 by refusing
    to mmap offsets higher than the highest valid address in the system.
    Signed-off-by: Craig Bergstrom <craigb@xxxxxxxxxx>
    Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@xxxxxxxxxx
    Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

    :040000 040000 4b430d0a1913539ab5e6652cb0d6ec5fdb2853ea 
788d61870d881543972178dd8fa61e180c1690a5 M  arch

xl dmesg and dmesg are attached.

Any thoughts on this one ?


lspci -vvvknn of device:

08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host 
Controller [1033:0194] (rev 03) (prog-if 30 [XHCI])
        Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 37
        NUMA node: 0
        Region 0: Memory at fe1fe000 (64-bit, non-prefetchable) [disabled] 
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable- Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 
unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- 
SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit 
Latency L0s <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- 
BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, 
OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, 
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel driver in use: pciback

