[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: Known console(d) bug?
Pasi KÃrkkÃinen <pasik@xxxxxx> writes: > On Fri, May 29, 2009 at 08:26:33PM +0200, Ferenc Wagner wrote: > >> There's a problem I'm struggling with for quite some time in our Xen >> hosting environment. Basically, after a couple of months' smooth >> running time, suddenly most virtual machines get stuck into r state >> and stop responding to anything, including xm console and xm sysrq. >> It happens rather regularly, but I can't reproduce it by taxing the >> domUs or the dom0 with disk I/O, CPU or console I/O. >> >> However, a couple of days ago it turned out that this situation can be >> cured by restarting xenconsoled! After that, xm console spit out the >> previous random typing, sysrq help strings and whatnot for the domUs >> which weren't stuck in r state, and the stuck ones also started to >> respond and run normally (spending most of their time in b state) again. >> >> The whole phenomenon looked like xenconsoled stopped emptying the domU >> console buffers, and those domUs which were constantly writing to >> their consoles quickly filled it up and started busy-looping trying to >> put more characters onto their consoles, not caring to respond to >> ping, even. But those domUs which didn't write to their consoles, >> stayed functional until the desperate operator forced them to create >> enough console output to fill up their buffers as well, and then they >> stuck into r state just like the others. After restarting xenconsoled >> all were able to recover successfully. >> >> Of course the above is just guessing, I don't know the details of Xen >> console handling. But I wonder if it rings any bells here, or maybe >> this issue is known and fixed already. Oh, I experience this under >> Xen 3.2 and pv-ops guests (2.6.26+patches). > > I've seen the exact same bug/problem with Xen in RHEL5/CentOS (5.0, 5.1, > 5.2). > I believe it's also in 5.3. > > I reported the problem to xen-devel, but I couldn't provide the needed > strace/backtrace to figure out the reason _why_ that happens.. (I had > already restarted xenconsoled..) > > I think developers would need more information to figure out what the > actual bug is. Indeed I found your report now. This means you're running for almost a year without experiencing this! I get it much more often, but still pretty rarely. I also noticed that the more or less regular WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 200 ms (> 50 ms) (GSource: 0x811bf80) messages from heartbeat came 50 times more often while xenstored was stuck (it didn't take any significant CPU at least). However, four domUs in constantly r state surely sucked up all the CPU power of the 4-way host machine. And this phenomenon is always triggered by some extra load, typically by tiger starting an md5sum check of the installed packages at the same time on a couple of domUs. (Btw. doesn't some randomized crond exist for helping this in general?) -- Cheers, Feri. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |