[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
On 10/11/10 5:44 PM, Christopher S. Aker wrote: In an effort to fix the problem described in my previous xen-devel post ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've come across another problem. 3ware 9690SA cards to not behave under Xen 4.1 (as of cs 22155). We have a simple Xen thrash test suite which fires up domUs that do different workloads (some swap thrash, some kernel build, some spin CPUs, some cycle rebooting, etc). Almost immediately after launching the suite we can get the 3ware 9690SA card to fail with something like the following: sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting card. sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting card. sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt Results matrix: +---------------------------------------------------------------+ | Xen | Dom0 | 9550SXU | 9690SA | 9750 | +---------------------------------------------------------------+ | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | +---------------------------------------------------------------+ The failures were verified on at least 2 machines of identical specification. The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb under Xen 4.1. I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack.Konrad, in the "offline raid" thread you asked for the following debug information: http://www.theshore.net/~caker/xen/BUGS/offline-raid/The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid card to hang up (but before it timed out and started spewing to the console). Oddly, lspci shows three devices assigned IRQ 16, however /proc/interrupts only lists two of them. Side effect of MSI? Also, the problem still happens even with MSI disabled (pci=nomsi). Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |