[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] help with xenstored 'hang'
Patrick Colp wrote: > I was recently struggling with what sounds like a not-too-dissimilar > problem while working with a disaggregated version of xenstore. The > ultimate solution for me was to disable pthreads in xenstore/libxs. I > just commented out the following line in tools/xenstore/Makefile: > > xs.opic: CFLAGS += -DUSE_PTHREAD > Xen3.2 predates c/s 17405, which introduced optional use of pthreads. Prior to that, pthreads was used explicitly. > After I removed that line and rebuilt and installed xenstore, it > worked just fine. I would be curious to know if this also solves your > problem. > I can see if the user is receptive to testing backported 17405 with pthreads disabled. Thanks for the suggestion. Jim > > Patrick > > > On 30 June 2010 15:15, Jim Fehlig <jfehlig@xxxxxxxxxx> wrote: > >> I'm trying to debug an 'xm list' hang on a large (~700 hosts) Xen 3.2 >> production installation. The hang occurs randomly, on a random host. >> User has provided cores of xend and xenstored processes when hang >> occurs. After poking at these cores I have discovered >> >> In xend process, a thread is blocked on a cond variable, waiting for a >> response to XS_TRANSACTION_START from xenstored. A reader thread >> responsible for reading from xenstored is blocked on read(2). >> >> In the xenstored process, the lone thread is blocked on select(2), >> waiting for IO. I examined the connections list and see that it contains >> a connection for the XS_TRANSACTION_START request. Dumping the >> connection object: >> >> (gdb) p *(struct connection *)0x526c70 >> $48 = {list = {next = 0x517c30, prev = 0x5151f0}, fd = 13, id = 0, >> can_write = >> true, in = 0x523600, >> out_list = {next = 0x526c98, prev = 0x526c98}, transaction = 0x0, >> transaction_list = {next = 0x523560, >> prev = 0x523560}, next_transaction_id = 60231445, transaction_started = 1, >> domain = 0x0, watches = { >> next = 0x51daa0, prev = 0x5267b0}, write = 0x402460 <writefd>, read = >> 0x405180 <readfd>} >> >> Notice transaction_started is set to 1, but out_list is empty. AFAICT, >> that means the reply has been sent to xend. The reader thread in xend >> should have received the response and signaled the cond variable - >> allowing execution to progress. Ultimately, xend would send a >> XS_TRANSACTION_END message, freeing the connection object in xenstored >> and removing it from connections list. >> >> Does my understanding of this code sound correct? Anyone have >> suggestions or further debugging tips? Examining cores is about my only >> debug option as user does not want to deploy debug patches, enable >> tracing, etc. across 700 hosts. >> >> Interestingly, when user strace's or attaches to xenstored process with >> gdb, xenstored "awakes", the hung 'xm list' returns, and xenstored >> continues normally. A new connection to xenstored (e.g. running xmtop) >> seems to poke it along as well. Would a timeout on select(2) in main >> loop of xenstored help at all? >> >> Thanks for any insights! >> Jim >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel >> >> >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |