[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] blocking Xen 3.X production use: soft lockup bugs
Hi All, I hate to say it, but it's starting to look like soft lockup bug(s) are turning into a serious roadblock for general production use of Xen 3.X, on a wide range of hardware. I've been using Xen since the 1.0 days, and I have to say that this the most serious showstopper bug I've ever hit -- it usually manifests itself during the first significant network and/or disk I/O after starting a second or third domU on the same box, and is the only bug I've ever hit that has caused permanent damage -- it tends to corrupt guest filesystems. In my case it's stopped a deployment dead in its tracks, and our only options at this point are to go back to Xen 2.X or (horrors) to native Linux kernels. The problem (or something that looks identical) is described in several tickets, status currently NEW or REOPENED, no clear resolution: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543 http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=690 http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=697 http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=705 In our own shop, we consistently hit soft lockups while running on both IBM x330's and older Netengines (similar to an IBM 4000R). We've found no workaround. We're on xen-3.0-testing, changeset 9732, kernel 2.6.6.13. On April 6th, Keir posted a note saying this was fixed as of a blkif_schedule() fix, which we already have because that was way back in changeset 9587... http://lists.xensource.com/archives/html/xen-devel/2006-04/msg00121.html. The most recent devel list traffic I've found which covers this is July 7th: http://lists.xensource.com/archives/html/xen-users/2006-07/msg00134.html ...this message referred back to Kier's comment as describing a fix, but it doesn't look true; while Kier's 9587 checkin may have fixed a soft lockup problem, there appear to be more out there, or else there's been regression. Do we have any consensus that this bug is fixed at all in xen-3.0-testing, or even unstable? Is anyone who was hitting soft lockups in testing *not* hitting them any more on the same hardware? If so, what changeset are you on now? If anyone needs any more information, just let me know. As usual, if anyone wants login and console server access to one of these boxes to chase this down, I'm more than happy to provide that. Thanks, Steve -- Stephen G. Traugott (KG6HDQ) UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@xxxxxxxxxxxxx http://www.stevegt.com -- http://Infrastructures.Org _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |