[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Fatal crash: ACPI assigning duplicate physical IRQ's to second DomU



Hi,

I've got a problem with ACPI assigning duplicate physical IRQ's to one
of my DomU's that I'm passing a PCI NIC to.  Can anyone shed some light
into ways I can avoid this problem with IRQ allocation?  I can see the
irq allocations using /proc/interrupts and see the conflict.

In my Dom0 I have 3 network cards.  eth0 and eth1 are identical
tulip-based 100MB cards, and eth2 is a realtek gigabit card that I'm
using as the xen-bridge. I have this problem with a variety of different
kernels - currently running kernel-xen-2.6.18-1 (fedora core 6
development tree) on all hosts, with xen-3.0.2.

Pass-through always works just fine for one of my DomU's, and ACPI
allocates an unused physical IRQ with no problems.  However, in a second
DomU, it consistently allocates the same IRQ as is used by my onboard
SATA controller (libata).

When the second DomU is running, I get a fatal crash that also destroys
my RAID volume info, and severely damages the filesystem.  I've had to
manually rebuild the raid each time this happens, so that I can try a
new alternative solution.  The crash typically happens within a few
minutes of booting.

After reading the archives of this list, as well as other lists, I've
tried putting "noirqdebug" as a kernel parameter in both the Dom0 and
DomU, and also made use of "noapic" and "acpi=off", as well as disabling
ACPI in my motherboard's bios (system is an athlonxp running on an
nforce2 motherboard with 2 gigs of ram).  None of them resolves the
conflict - it appears to be a bug that affects pass-through of PCI
devices and IRQ allocation?

I've also tried a variety of other ethernet devices (the forcedeth
driver for nforce2 onboard nic, and also natsemi driver for a Netgear
FA311) to pass through to the second DomU, with the same result.  Moving
the PCI card to a different PCI bus address/slot doesn't resolve the
problem either.

I managed to grab Dmesg outputs from dom0 and the problem DomU last time
it crashed -

The message I'm getting to console in the domU is:
####
irq 11: nobody cared (try booting with the "irqpoll" option)
[<c040569e>] dump_trace+0x69/0x1af
[<c04057fc>] show_trace_log_lvl+0x18/0x2c
[<c0405d9c>] show_trace+0xf/0x11
[<c0405dcb>] dump_stack+0x15/0x17
[<c044636e>] __report_bad_irq+0x36/0x7d
[<c044655b>] note_interrupt+0x1a6/0x1e3
[<c0445bda>] __do_IRQ+0xba/0xf2
[<c0406c2c>] do_IRQ+0x9e/0xbc
=======================
handlers:
[<d10636e8>] (tulip_interrupt+0x0/0xdb8 [tulip])
Disabling IRQ #11
end_request: I/O error, dev xvda, sector 42806344
Buffer I/O error on device xvda3, logical block 5061623
lost page write due to I/O error on xvda3
####

The dmesg output from the Dom0 following booting of the second DomU is:

####
PCI: Enabling device 0000:01:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:01:0a.0[A] -> Link [LNK3] -> GSI 11 (level,
low) -> IRQ 11
ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
xenbr0: port 4(vif4.0) entering learning state
xenbr0: topology change detected, propagating
xenbr0: port 4(vif4.0) entering forwarding state
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x64)
ata1.00: tag 0 cmd 0x25 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: (BMDMA stat 0x64)
ata2.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata2: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata2: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata2: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2.00: disabled
ata1: EH complete
ata2: EH complete
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 58152585
raid5:md3: read error not correctable (sector 46682112 on sda5).
raid5: Disk failure on sda5, disabling device. Operation continuing on 1
devices
raid5:md3: read error not correctable (sector 46682120 on sda5).
####


Please, any help in resolving this appreciated - I'd like to get this
host up and running!

Thanks,

Hilton.


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.