[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Ward Vandewege <ward@xxxxxxx>
  • Date: Fri, 28 Jan 2011 13:58:09 -0500
  • Delivery-date: Fri, 28 Jan 2011 10:59:50 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi list,

I'm having some problems trying to pass through a Mellaxnox ConnectX HCA
to a domU.

This is on Xen 4.0.1, with the latest Debian Testing packages:

  ii  xen-hypervisor-4.0-amd64                4.0.1-2  
  ii  linux-image-2.6.32-5-xen-amd64          2.6.32-30

The hardware is Supermicro H8DGT-HIBQF, BIOS revision 1.0c (date 10/29/10).
It has two AMD Opteron 6128 CPUs, for a total of 16 cores. The machine has
32GiB of ram. The Mellannox adapter looks like this in the dom0:

  02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 
5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Super Micro Computer Inc Device 0048
    Flags: fast devsel, IRQ 19
    Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
    Memory at fc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Kernel driver in use: pciback

I've attached the output of xm dmesg (xm.dmesg.txt).

I have the following in the domU config files:

  pci = ['0000:02:00.0'] 

I've attached the boot log from trying to boot the same kernel as a HVM guest
(testsqueezehvm.bootlog.txt). Doing so generates these four lines of output
in xm dmesg:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c0c0

The mlx4_core driver in the domU is not happy:

[    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
[    0.411879] mlx4_core: Initializing 0000:00:00.0
[    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command
interface revision 0.
[    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    1.417527] mlx4_core 0000:00:00.0: This driver version supports only
revisions 2 to 3.
[    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.

When trying to boot a PV domU with kernel options iommu=soft and
swiotlb=force, the output is slightly different. The full bootlog is attached
(testsqueeze.bootlog.txt). Here's the relevant excerpt:

[    0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2
(August 4, 2010)
[    0.441696] mlx4_core: Initializing 0000:00:00.0
[    0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X
interrupt IRQ 54).
[    2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X.
[    2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate
interrupt (IRQ 54), aborting.
[    2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing
problem?
[    2.916920] mlx4_core: probe of 0000:00:00.0 failed with error -16

And xm dmesg quickly fills up with many, many lines like this:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43020
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43060
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430a0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430c0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430e0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43100
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43120
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43140
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43160
...

Booting a PV domU with only the swiotlb=force option makes the output much
more like the HVM output.

Any thoughts on what could be going on here?

Thanks,
Ward.


Attachment: xm.dmesg.txt
Description: Text document

Attachment: testsqueeze.bootlog.txt
Description: Text document

Attachment: testsqueezehvm.bootlog.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.