[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Fatal crash: ACPI assigning duplicate physical IRQ's to second DomU
Hi, I've got a problem with ACPI assigning duplicate physical IRQ's to one of my DomU's that I'm passing a PCI NIC to. Can anyone shed some light into ways I can avoid this problem with IRQ allocation? I can see the irq allocations using /proc/interrupts and see the conflict. In my Dom0 I have 3 network cards. eth0 and eth1 are identical tulip-based 100MB cards, and eth2 is a realtek gigabit card that I'm using as the xen-bridge. I have this problem with a variety of different kernels - currently running kernel-xen-2.6.18-1 (fedora core 6 development tree) on all hosts, with xen-3.0.2. Pass-through always works just fine for one of my DomU's, and ACPI allocates an unused physical IRQ with no problems. However, in a second DomU, it consistently allocates the same IRQ as is used by my onboard SATA controller (libata). When the second DomU is running, I get a fatal crash that also destroys my RAID volume info, and severely damages the filesystem. I've had to manually rebuild the raid each time this happens, so that I can try a new alternative solution. The crash typically happens within a few minutes of booting. After reading the archives of this list, as well as other lists, I've tried putting "noirqdebug" as a kernel parameter in both the Dom0 and DomU, and also made use of "noapic" and "acpi=off", as well as disabling ACPI in my motherboard's bios (system is an athlonxp running on an nforce2 motherboard with 2 gigs of ram). None of them resolves the conflict - it appears to be a bug that affects pass-through of PCI devices and IRQ allocation? I've also tried a variety of other ethernet devices (the forcedeth driver for nforce2 onboard nic, and also natsemi driver for a Netgear FA311) to pass through to the second DomU, with the same result. Moving the PCI card to a different PCI bus address/slot doesn't resolve the problem either. I managed to grab Dmesg outputs from dom0 and the problem DomU last time it crashed - The message I'm getting to console in the domU is: #### irq 11: nobody cared (try booting with the "irqpoll" option) [<c040569e>] dump_trace+0x69/0x1af [<c04057fc>] show_trace_log_lvl+0x18/0x2c [<c0405d9c>] show_trace+0xf/0x11 [<c0405dcb>] dump_stack+0x15/0x17 [<c044636e>] __report_bad_irq+0x36/0x7d [<c044655b>] note_interrupt+0x1a6/0x1e3 [<c0445bda>] __do_IRQ+0xba/0xf2 [<c0406c2c>] do_IRQ+0x9e/0xbc ======================= handlers: [<d10636e8>] (tulip_interrupt+0x0/0xdb8 [tulip]) Disabling IRQ #11 end_request: I/O error, dev xvda, sector 42806344 Buffer I/O error on device xvda3, logical block 5061623 lost page write due to I/O error on xvda3 #### The dmesg output from the Dom0 following booting of the second DomU is: #### PCI: Enabling device 0000:01:0a.0 (0000 -> 0003) ACPI: PCI Interrupt 0000:01:0a.0[A] -> Link [LNK3] -> GSI 11 (level, low) -> IRQ 11 ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready xenbr0: port 4(vif4.0) entering learning state xenbr0: topology change detected, propagating xenbr0: port 4(vif4.0) entering forwarding state ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: (BMDMA stat 0x64) ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout) ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: (BMDMA stat 0x64) ata2.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: soft resetting port ata2: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata2: hard resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata2: hard resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: disabled ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2.00: revalidation failed (errno=-5) ata2.00: disabled ata1: EH complete ata2: EH complete sd 0:0:0:0: SCSI error: return code = 0x00040000 end_request: I/O error, dev sda, sector 58152585 raid5:md3: read error not correctable (sector 46682112 on sda5). raid5: Disk failure on sda5, disabling device. Operation continuing on 1 devices raid5:md3: read error not correctable (sector 46682120 on sda5). #### Please, any help in resolving this appreciated - I'd like to get this host up and running! Thanks, Hilton. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |