[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Is: pci=assign-busses blows up Xen 4.4 Was:Re: [PATCH] x86/msi: Validate the guest-identified PCI devices in pci_prepare_msix()
On Fri, Jan 24, 2014 at 12:43:49PM -0500, Konrad Rzeszutek Wilk wrote: > On Fri, Jan 24, 2014 at 04:19:15PM +0000, Jan Beulich wrote: > > >>> On 24.01.14 at 16:01, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > >>> wrote: > > > I built the kernel without the igb driver just to eliminate it being > > > the culprit. Now I can boot without issues and this is what lspci > > > reports: > > > > > > -bash-4.1# lspci -s 02:00.0 -v > > > 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network > > > Connection (rev 01) > > > Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter > > > Flags: bus master, fast devsel, latency 0, IRQ 10 > > > Memory at f1420000 (32-bit, non-prefetchable) [size=128K] > > > Memory at f1000000 (32-bit, non-prefetchable) [size=4M] > > > I/O ports at e020 [size=32] > > > Memory at f1444000 (32-bit, non-prefetchable) [size=16K] > > > Expansion ROM at f0c00000 [disabled] [size=4M] > > > Capabilities: [40] Power Management version 3 > > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > > > Capabilities: [70] MSI-X: Enable- Count=10 Masked- > > > > So here's a patch to figure out why we don't find this. > > Thank you! > > See attached log. The corresponding xen-syms is compressed and > updated at : http://darnok.org/xen/xen-syms.gz > > The interesting bit is: > > (XEN) 02:00.0: status=0010 (alloc_pdev+0xb4/0x2e9 wants 11) > (XEN) 02:00.0: pos=40 > (XEN) 02:00.0: id=01 > (XEN) 02:00.0: pos=50 > (XEN) 02:00.0: id=05 > (XEN) 02:00.0: pos=70 > (XEN) 02:00.0: id=11 > (XEN) 02:00.1: status=0010 (alloc_pdev+0xb4/0x2e9 wants 11) > (XEN) 02:00.1: pos=40 > (XEN) 02:00.1: id=01 > (XEN) 02:00.1: pos=50 > (XEN) 02:00.1: id=05 > (XEN) 02:00.1: pos=70 > (XEN) 02:00.1: id=11 You were right on the idea that it might be the device not having the right capabilities, but it was the wrong BDF. I instrumented the faulting operation to make sure I knew which BDF it was: (XEN) 02:00.0: alloced (179) (XEN) 02:00.0: alloced (189) ffff830239467f70,pdev ffff8302394660d0 (XEN) 02:00.1: alloced (179) (XEN) 02:00.1: alloced (189) ffff830239466250,pdev ffff830239466190 (XEN) 04:00.0: alloced (179) (XEN) 04:00.0: alloced (189) ffff830239466520,pdev ffff830239466460 (XEN) 05:00.0: status=0010 (alloc_pdev+0xb7/0x360 wants 11) (XEN) 05:00.0: pos=60 (XEN) 05:00.0: id=0d (XEN) 05:00.0: pos=a0 (XEN) 05:00.0: id=01 (XEN) 05:00.0: pos=00 (XEN) 05:00.0: no cap 11 (XEN) 08:00.0: alloced (179) (XEN) 08:00.0: alloced (189) ffff830239466eb0,pdev ffff830239466df0 (XEN) [2014-01-25 03:42:08] msix_capability_init:759 for 05:00.0:, msix:0 dev:ffff8302394665b0 (XEN) [2014-01-25 03:42:08] ----[ Xen-4.4-rc2 x86_64 debug=y Tainted: C ]---- (XEN) [2014-01-25 03:42:08] CPU: 0 (XEN) [2014-01-25 03:42:08] RIP: e008:[<ffff82d0801683d6>] msix_capability_init+0x210/0x63e ... snip.. (XEN) [2014-01-25 03:42:08] Xen call trace: (XEN) [2014-01-25 03:42:08] [<ffff82d0801683d6>] msix_capability_init+0x210/0x63e (XEN) [2014-01-25 03:42:08] [<ffff82d0801689c2>] pci_enable_msi+0x1be/0x4d7 (XEN) [2014-01-25 03:42:08] [<ffff82d08016c68c>] map_domain_pirq+0x222/0x5ad (XEN) [2014-01-25 03:42:08] [<ffff82d08017f134>] physdev_map_pirq+0x507/0x5d1 (XEN) [2014-01-25 03:42:08] [<ffff82d08017f844>] do_physdev_op+0x646/0x1232 (XEN) [2014-01-25 03:42:08] [<ffff82d0802223ab>] syscall_enter+0xeb/0x145 (XEN) [2014-01-25 03:42:08] (XEN) [2014-01-25 03:42:08] Pagetable walk from 0000000000000004: (XEN) [2014-01-25 03:42:08] L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) [2014-01-25 03:42:08] (XEN) [2014-01-25 03:42:08] **************************************** (XEN) [2014-01-25 03:42:08] Panic on CPU 0: (XEN) [2014-01-25 03:42:08] FATAL PAGE FAULT (XEN) [2014-01-25 03:42:08] [error_code=0000] (XEN) [2014-01-25 03:42:08] Faulting linear address: 0000000000000004 (XEN) [2014-01-25 03:42:08] **************************************** (XEN) [2014-01-25 03:42:08] (XEN) [2014-01-25 03:42:08] Manual reset required ('noreboot' specified) lspci shows (baremetal kernel, with said driver): bash-4.1# lspci -s 05:00.0 -v 05:00.0 Ethernet controller: Intel Corporation Device 1533 (rev 03) Subsystem: Super Micro Computer Inc Device 1533 Flags: bus master, fast devsel, latency 0, IRQ 19 Memory at f1900000 (32-bit, non-prefetchable) [size=512K] I/O ports at c000 [size=32] Memory at f1980000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-25-90-ff-ff-86-be-f1 Capabilities: [1a0] #17 Kernel driver in use: igb aka, Intel I210 lspci shows (Xen, kernel does not have igb built-in): -bash-4.1# lspci -s 05:00.0 -v 05:00.0 Ethernet controller: Intel Corporation Device 1533 (rev 03) Subsystem: Super Micro Computer Inc Device 1533 Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at f1900000 (32-bit, non-prefetchable) [size=512K] I/O ports at c000 [size=32] Memory at f1980000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable- Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-25-90-ff-ff-86-be-f1 Capabilities: [1a0] #17 And with -xxx: bash-4.1# lspci -s 05:00.0 -xxx 05:00.0 Ethernet controller: Intel Corporation Device 1533 (rev 03) 00: 86 80 33 15 07 00 10 00 03 00 00 02 10 00 00 00 10: 00 00 90 f1 00 00 00 00 01 c0 00 00 00 00 98 f1 20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 33 15 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00 40: 01 50 23 c8 08 20 00 00 00 00 00 00 00 00 00 00 50: 05 70 80 01 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 11 a0 04 00 03 00 00 00 03 20 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff a0: 10 00 02 00 c2 8c 00 10 07 28 19 00 11 5c 42 00 b0: 40 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 d0: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Which would imply that we should start with '50' offset, not '60'! If I boot baremetal with 'pci=earlydump' I get: [ 0.000000] pci 0000:05:00.0 config space: [ 0.000000] 00: e3 10 13 81 07 00 10 00 01 01 04 06 00 00 01 00 [ 0.000000] 10: 00 00 00 00 00 00 00 00 05 06 07 20 f1 01 a0 22 [ 0.000000] 20: 50 f1 60 f1 f1 ff 01 00 00 00 00 00 00 00 00 00 [ 0.000000] 30: ff 00 00 00 60 00 00 00 00 00 00 00 ff 00 10 00 [ 0.000000] 40: 00 aa 00 00 00 19 90 7d 80 01 00 00 07 03 00 00 [ 0.000000] 50: 68 89 09 80 00 1f 00 00 00 01 00 00 00 00 00 00 [ 0.000000] 60: 0d a0 00 00 d9 15 05 08 00 00 00 00 00 00 00 00 [ 0.000000] 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] a0: 01 00 03 f8 08 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] b0: 00 00 00 00 40 00 00 00 00 00 00 00 ef fb be 07 [ 0.000000] c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Which does indeed show that at bootup the PCI configuration space is different. <blink>And the driver id does not match! If I look at one that has it: [ 0.000000] pci 0000:04:00.0 config space: [ 0.000000] 00: 86 80 33 15 07 00 10 00 03 00 00 02 10 00 00 00 [ 0.000000] 10: 00 00 90 f1 00 00 00 00 01 c0 00 00 00 00 98 f1 [ 0.000000] 20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 33 15 [ 0.000000] 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00 [ 0.000000] 40: 01 50 23 c8 08 20 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 50: 05 70 80 01 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 70: 11 a0 04 00 03 00 00 00 03 20 00 00 00 00 00 00 [ 0.000000] 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 90: 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff [ 0.000000] a0: 10 00 02 00 c2 8c 00 10 07 28 19 00 11 5c 42 00 [ 0.000000] b0: 42 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] c0: 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 That matches more of the reality and 04:00.0 is actually 05:00.0. The reason that is happening is probably because of: -bash-4.1# cat /proc/cmdline initrd=initramfs.cpio.gz console=ttyS0,115200 kgdboc=ttyS0 pci=assign-busses pci=earlydump BOOT_IMAGE=vmlinuz -bash-4.1# The 'assign-busses' which is needed for SR-IOV to work. If don't use that paremeter Linux kernel (baremetal and with Xen) tells me: -bash-4.1# cat /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/sriov_numvfs 0 -bash-4.1# cat /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/sriov_totalvfs 7 -bash-4.1# echo 7 > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/sriov_numvfs -bash: echo: write error: Cannot allocate memory -bash-4.1# dmesg | tail [ 241.874349] random: sshd urandom read with 63 bits of entropy available [ 242.918267] Loading iSCSI transport class v2.0-870. [ 242.926046] iscsi: registered transport (tcp) [ 244.689798] scsi8 : iSCSI Initiator over TCP/IP [ 244.709799] connection1:0: detected conn error (1020) [ 244.969450] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@xxxxxxxxxx [ 244.980434] device-mapper: multipath: version 1.6.0 loaded [ 250.027291] random: nonblocking pool is initialized [ 256.282312] switch: port 1(eth0) entered forwarding state [ 365.468641] igb 0000:02:00.0: SR-IOV: bus number out of range And sure enough if I boot Xen without 'pci=assign-busses' it works just fine. Ugh. I wonder how Xen 4.3 would actually do the PCI passthrough - it booted with the 'assign-busses' - but I hadn't tried to do PCI passthrough of the PF device (the I210). If do pass in '05:00.0' (new bus number) I wonder if it will use IOMMU context with whatever '05:00.0' was _before_ the bus re-assigment aka: 05:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=05, secondary=06, subordinate=07, sec-latency=32 Memory behind bridge: f1500000-f16fffff Capabilities: [60] Subsystem: Super Micro Computer Inc Device 0805 Capabilities: [a0] Power Management version 3 Which I think would confuse Xen as this is clearly labeled as bridge not a PCI device. The reason for me using 'pci=assign-busses' is that it looks to be the only option to use SR-IOV. Which I suppose makes sense as it tries to create VFs right after its own bus id: +-01.1-[02-03]--+-[0000:03]-+-10.0 Intel Corporation 82576 Virtual Function | | +-10.1 Intel Corporation 82576 Virtual Function | | +-10.2 Intel Corporation 82576 Virtual Function | | +-10.3 Intel Corporation 82576 Virtual Function | | +-10.4 Intel Corporation 82576 Virtual Function | | +-10.5 Intel Corporation 82576 Virtual Function | | +-10.6 Intel Corporation 82576 Virtual Function | | +-10.7 Intel Corporation 82576 Virtual Function | | +-11.0 Intel Corporation 82576 Virtual Function | | +-11.1 Intel Corporation 82576 Virtual Function | | +-11.2 Intel Corporation 82576 Virtual Function | | +-11.3 Intel Corporation 82576 Virtual Function | | +-11.4 Intel Corporation 82576 Virtual Function | | \-11.5 Intel Corporation 82576 Virtual Function | \-[0000:02]-+-00.0 Intel Corporation 82576 Gigabit Network Connection | \-00.1 Intel Corporation 82576 Gigabit Network Connection But why does it have to have the bus _right_ after its own? Can't it use one at the end of the its bus-space? The bus is after it is occupied by another card (if I boot without 'pci=assign-busses'). I do recall using this particular SR-IOV card on a different hardware a year ago or so. And it did work. I think that might be because there were no PCI cards _after_ the SR-IOV card. For posterity, with pci=assign-busses under baremetal (with SR-IOV enabled): 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 05:00.0 Ethernet controller: Intel Corporation Device 1533 (rev 03) 06:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) 07:01.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 11) 07:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 08:08.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 08:08.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 08:09.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 08:09.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 08:0a.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 08:0a.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 08:0b.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 08:0b.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 09:00.0 USB Controller: Renesas Technology Corp. Device 0015 (rev 02) 0a:00.0 SATA controller: Device 1b21:0612 (rev 01) Without 'pci=assign-busses' under baremetal: 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 04:00.0 Ethernet controller: Intel Corporation Device 1533 (rev 03) 05:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) 06:01.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 11) 06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 07:08.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 07:08.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 07:09.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 07:09.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 07:0a.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 07:0a.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 07:0b.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 07:0b.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 08:00.0 USB Controller: Renesas Technology Corp. Device 0015 (rev 02) 09:00.0 SATA controller: Device 1b21:0612 (rev 01) This problem with SR-IOV bus seems to have been solved in 2009: commit a28724b0fb909d247229a70761c90bb37b13366a Author: Yu Zhao <yu.zhao@xxxxxxxxx> Date: Fri Mar 20 11:25:13 2009 +0800 PCI: reserve bus range for SR-IOV device Reserve the bus number range used by the Virtual Function when pcibios_assign_all_busses() returns true. And pcibios_assign_all_busses() is the one that returns true if 'pci=assign-busses' is set. Attachment:
tst035-jan-debug-2.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |