[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Fatal crash: ACPI assigning duplicate physical IRQ's to second DomU
More info - here is lspci and the contents of /proc/interrupts for each dom. Dom0 #### [root@dns ~]# cat /proc/interrupts CPU0 1: 8 Phys-irq i8042 3: 2 Phys-irq ehci_hcd:usb3 4: 0 Phys-irq ohci_hcd:usb2 8: 1 Phys-irq rtc 10: 303 Phys-irq eth2 11: 4347 Phys-irq ohci_hcd:usb1, libata 12: 113 Phys-irq i8042 14: 2112 Phys-irq ide0 15: 416 Phys-irq ide1 256: 5804 Dynamic-irq timer0 257: 0 Dynamic-irq resched0 258: 0 Dynamic-irq callfunc0 259: 0 Dynamic-irq xenbus 260: 0 Dynamic-irq console NMI: 0 LOC: 0 ERR: 0 MIS: 0 #### DomU - this has the problem (see IRQ 11) #### [root@intranet ~]# cat /proc/interrupts CPU0 11: 1393 Phys-irq eth1 256: 3073 Dynamic-irq timer0 257: 0 Dynamic-irq resched0 258: 0 Dynamic-irq callfunc0 259: 104 Dynamic-irq xenbus 260: 147 Dynamic-irq xencons 261: 1799 Dynamic-irq blkif 262: 103 Dynamic-irq eth0 NMI: 0 LOC: 0 ERR: 0 MIS: 0 #### DomU (this is the other DomU that works fine) #### [root@gateway ~]# cat /proc/interrupts CPU0 5: 17 Phys-irq eth1 256: 3111 Dynamic-irq timer0 257: 0 Dynamic-irq resched0 258: 0 Dynamic-irq callfunc0 259: 104 Dynamic-irq xenbus 260: 170 Dynamic-irq xencons 261: 1859 Dynamic-irq blkif 262: 130 Dynamic-irq eth0 NMI: 0 LOC: 0 ERR: 0 MIS: 0 ####Dom0 PCI (using pciback to reserve 01:09.0 and 01:0a.0. 01:0a.0 is getting the problem IRQ) #### [root@dns ~]# lspci00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) (rev c1) 00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev c1) 00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev c1) 00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev c1) 00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev c1) 00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev c1) 00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2) 00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3) 00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3) 00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3) 00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3) 00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2) 00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 01:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20) 01:0a.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)01:0b.0 RAID bus controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) 02:00.0 VGA compatible controller: ATI Technologies Inc Radeon R300 ND [Radeon 9700 Pro] 02:00.1 Display controller: ATI Technologies Inc Radeon R300 [Radeon 9700 Pro] (Secondary) #### Hilton Day wrote: Hi, I've got a problem with ACPI assigning duplicate physical IRQ's to one of my DomU's that I'm passing a PCI NIC to. Can anyone shed some light into ways I can avoid this problem with IRQ allocation? I can see the irq allocations using /proc/interrupts and see the conflict. In my Dom0 I have 3 network cards. eth0 and eth1 are identical tulip-based 100MB cards, and eth2 is a realtek gigabit card that I'm using as the xen-bridge. I have this problem with a variety of different kernels - currently running kernel-xen-2.6.18-1 (fedora core 6 development tree) on all hosts, with xen-3.0.2. Pass-through always works just fine for one of my DomU's, and ACPI allocates an unused physical IRQ with no problems. However, in a second DomU, it consistently allocates the same IRQ as is used by my onboard SATA controller (libata). When the second DomU is running, I get a fatal crash that also destroys my RAID volume info, and severely damages the filesystem. I've had to manually rebuild the raid each time this happens, so that I can try a new alternative solution. The crash typically happens within a few minutes of booting. After reading the archives of this list, as well as other lists, I've tried putting "noirqdebug" as a kernel parameter in both the Dom0 and DomU, and also made use of "noapic" and "acpi=off", as well as disabling ACPI in my motherboard's bios (system is an athlonxp running on an nforce2 motherboard with 2 gigs of ram). None of them resolves the conflict - it appears to be a bug that affects pass-through of PCI devices and IRQ allocation? I've also tried a variety of other ethernet devices (the forcedeth driver for nforce2 onboard nic, and also natsemi driver for a Netgear FA311) to pass through to the second DomU, with the same result. Moving the PCI card to a different PCI bus address/slot doesn't resolve the problem either. I managed to grab Dmesg outputs from dom0 and the problem DomU last time it crashed - The message I'm getting to console in the domU is: #### irq 11: nobody cared (try booting with the "irqpoll" option) [<c040569e>] dump_trace+0x69/0x1af [<c04057fc>] show_trace_log_lvl+0x18/0x2c [<c0405d9c>] show_trace+0xf/0x11 [<c0405dcb>] dump_stack+0x15/0x17 [<c044636e>] __report_bad_irq+0x36/0x7d [<c044655b>] note_interrupt+0x1a6/0x1e3 [<c0445bda>] __do_IRQ+0xba/0xf2 [<c0406c2c>] do_IRQ+0x9e/0xbc ======================= handlers: [<d10636e8>] (tulip_interrupt+0x0/0xdb8 [tulip]) Disabling IRQ #11 end_request: I/O error, dev xvda, sector 42806344 Buffer I/O error on device xvda3, logical block 5061623 lost page write due to I/O error on xvda3 #### The dmesg output from the Dom0 following booting of the second DomU is: #### PCI: Enabling device 0000:01:0a.0 (0000 -> 0003) ACPI: PCI Interrupt 0000:01:0a.0[A] -> Link [LNK3] -> GSI 11 (level, low) -> IRQ 11 ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready xenbr0: port 4(vif4.0) entering learning state xenbr0: topology change detected, propagating xenbr0: port 4(vif4.0) entering forwarding state ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: (BMDMA stat 0x64) ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: (BMDMA stat 0x64) ata2.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: soft resetting port ata2: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata2: hard resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata2: hard resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: disabled ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2.00: disabled ata1: EH complete ata2: EH complete sd 0:0:0:0: SCSI error: return code = 0x00040000 end_request: I/O error, dev sda, sector 58152585 raid5:md3: read error not correctable (sector 46682112 on sda5). raid5: Disk failure on sda5, disabling device. Operation continuing on 1 devices raid5:md3: read error not correctable (sector 46682120 on sda5). #### Please, any help in resolving this appreciated - I'd like to get this host up and running! Thanks, Hilton. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |