[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"
On Wed, May 5, 2010 at 7:02 PM, Andrew Lyon <andrew.lyon@xxxxxxxxx> wrote: > On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram <modelnine@xxxxxxxxxxxxx> wrote: >> Hey all! >> >> I'm currently in the process of migrating a (Gentoo-based) Xen-server to use >> Xen 4.0.0 (where I'm using the Xen ebuilds from bugs.gentoo.org), and I'm >> having severe problems with tapdisk2 (which I wish to use to do I/O >> prioritizing using CFQ on the LVM-based backing storage of a virtual >> server). >> >> It seems that after a while of heavy I/O in the virtual domain, the >> communication between the (paravirtualized) DomU and Dom0 (the >> tapdisk2-process) breaks, in that no more interrupts are delivered to Dom0 >> for I/O requests from the virtual domain, and as such the virtual host >> "loses" its harddisk (but does not "break" besides not responding). The >> network front-/backend is not affected by this communication loss, AFAICT. >> >> The virtual host can be destroyed by an xm destroy, but the created blktap2 >> interface does not disappear until the next reboot, and cannot be removed by >> the respective sysfs accesses (rather, echoing a 1 into "remove" blocks, >> too, and is "unkillable", i.e. stays in kernel space). After a blktap2 >> device has entered this broken state, no more hosts can be created by xm >> create (that blocks, too), and the host system must be rebooted to enter a >> usable state again. >> >> I've not been able to provoke this breakage by "normal" I/O (i.e., when the >> hosts run normally), but I have been able to provoke it by using bonnie, >> which after a short period of substained read/write I/O of +120MB/s will >> freeze the blktap2 device. >> >> The Dom0 and the DomU kernels that are being used are xen-sources-2.6.32-r1 >> (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel sources, >> AFAIK) from the official portage tree; the kernel configuration that's in >> use is attached. >> >> I've tried iommu=off for xen (the mobo doesn't support VT-d anyway, so Xen >> never turns it on), and I've also looked for any signs of errors appearing >> when setting verbosity 9 for the blktap2 module and loglvl=all and >> guest_loglvl=all for Xen, but there are no errors that I've seen so far. >> >> Strace-ing the tapdisk2 process reveals that it's blocked on select(), and >> none of the descriptors it's polling on ever return as readable (which is >> the condition that tapdisk2 queries), rather they always timeout after 600s. >> >> Thanks in advance for any hint as to what is causing this, or if there's >> anything I might try to get things working... >> >> PS: I have to boot with acpi=off, as the mobo won't reboot when acpi is >> turned on for Dom0 (not even when disabling ACPI reboots), but using acpi >> directly doesn't change that blktap2 blocks. >> >> --- Heiko. >> >> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-users >> > > I have had exactly the same problem and ended up going back to tapdisk1. > > I was able to replicate the problem using the entire SLE11-SP1 kernel > source patch set which proves that the bug exists upstream, > unfortunately I am very busy on other projects at the moment so did > not have time to debug it at all. > > The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I > will make a updated set of patches for you to try but it will take me > a couple of days. > > Andy > Hi, I have uploaded updated 2.6.32 patches and ebuild to http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that patches should be applied to 2.6.32.13. They should be added to portage in a few days time, provided no problems are found. Andy _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |