[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] [Xen-devel] bug in xenstored? No notification to subscription on @introduceDomain

Thanks for reply.

The problem is we tried at least two different libraries - xs (+python xen.lowlevel.xs) and our own library (pyxs), created from scratches on pure python - both shows exactly same behavior. We loosing same time @introduce and @release, but only for new domains. Older domains (which starts before error appear) during shutdown/migration sends @release normally.

I done strace, nothig is sending by xenstored to application socket when 'new' domains appears and disappears (I'm not sure 100% due not very good strace skills).

Application performs write/read operations to/from xenstore (and do many subscriptions, but only after @introduce) and older subscription works fine.

PS We got other strange bug with memory leak in xenstored (happens only with big amount of transactions, and ONLY with socket) - but this case is still under research, so I decide not to post this (but may be it related somehow?).

Sorry for question - how I can gather debug information for oxenstored?

On 12.12.2011 15:31, Ian Campbell wrote:
On Fri, 2011-12-09 at 19:49 +0000, George Shuklin wrote:
Good day.

I think I met some strange bug in xenstored.
If you are using XCP then this will be using oxenstored. I've CC'd
xen-api@ since that is the correct place for XCP discussions.

It's also plausibly a bug in the C client library or the python bindings
to that library (or indeed your application).

I using XCP for long time and all that time we have some funny bug we
was not able to debug enough due product environment and very low chance
to appear, now we was able to catch it in testing environment and done
some research.

We have python application running in dom0 and waiting domain
appearance. This implemented this via subscription to @introduceDomain
xenstore key. Under some conditions we stops to receive notification on
subscription. If we ran application as second instance it will receive
that notification, if we restart application it will  receive too.
You lose both @introduce and @release notifications or just @introduce?

Does the app do any other XS stuff, e.g. other watches or read/write? Do
these stop working also?

oxenstored (at least in XCP) logs to /var/log/xenstore-access.log -- do
you see any activity in there? There is also /var/log/xenstored.log

Does strace show the daemon writing (or trying to write) to the socket
associated with this client? What about on the client side? (nb:
libxenstore uses a thread to handle watches so be sure to use the
appropriate options to strace.) Identifying the fd associated with the
connection on either end might be tricky, /proc/<pid>/fd and/or netstat
might help narrow it down.

The app being python presumably makes it hard to attach gdb to and get
anything sensible, likewise the daemon being ocaml. If anyone has any
hints on attaching a debugging to an existing process of these types
then that might be useful.

Other than that I'm afraid I really don't have any idea what might be
going wrong, or indeed what other next steps can be taken to diagnose
the issue :-(


I unable to pinpoint exact condition for this, but this
a) Happens occasionally but consistently (about once a month in farm of
50 hosts at least at one host)
b) Not related to xenstored uptime
c) Not related to load on xen or dom0
d) Not related to amount of domains
e) Occur at least at XCP 0.5, 1.0 and 1.1 (I don't know how to get
version from xenstored)

Last time I got that on two hosts in lab at same time (with single guest
domain without any high load) and done some experiments - so I can say
exactly I wrote above.

The pieces from python code we ran:

from xen.lowlevel.xs import xs
conn = xs.xs()
conn.watch("@introduceDomain", "+")
conn.watch("@releaseDomain", "-")

Xen-devel mailing list

xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.