[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Use timeout on xenstore read_reply to avoid task hunging



On Wed, 2011-03-02 at 11:35 +0000, Frank Pan wrote:
> Oh sorry.
> Patch attached.
> 
> On Wed, Mar 2, 2011 at 6:34 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> > It all sounds very plausible to me but you've forgotten the patch ;-)
> >
> > Why wait_event_timeout and not wait_event_interruptible to allow users
> > to interrupt? In particular I'm concerned about the arbitrarily chosen
> > 5s timeout not being sufficient on a busy system.
> 
> FP: I wait_event_interruptible is a choice. But it needs user
> operation such as kill command. User-level tool(xenstore-ls for
> example) can also set SIGALRM or something else, but it sounds not so
> good.

I think this is preferable to requiring all clients be reworked to
handle a timeout event which they aren't currently expecting.

Rather than have all xenbus clients expect and handle an arbitrary
timeout we should provide a new interface to in-kernel users of xenbus
which includes a timeout parameter, e.g. foo_timeout. (assuming there
are in kernel users who could do anything sane with a timeout, if not
then we shouldn't bother with the interface).

For userspace clients I think we would probably be better off making
sure that poll/select work properly than trying to find a way to expose
timeouts of this sort, that combined with making the sleeps
interruptible, would cover the problems we care about, I think.

> The timeout parameter is something discussible. 5s may not be a good
> one, but I believe xenstored on a healthy system should be response in
> 5s. What do you think?

I've definitely seen cases in heavily loaded domain 0 where it has taken
longer than 5s (think of a system which is starting dozens of new
domains concurrently for example)

> > Once specific pitfall which I remember was that userspace clients are at
> > liberty to make use of the req_id themselves (and some do). Fixing this
> > might involve shadowing the user provided req_id with a kernel generated
> > ID on the ring and unshadowing in the responses...
> 
> FP: Yes, that's what I supposed to do. But I cannot find any
> dereference on the req_id section of the reply msg. If it exist
> somewhere, shadowing is surely needed.

Userspace clients communicating via the drivers/xen/xenfs/xenbus.c
driver are at liberty to use the req_id field for their own purposes.

For example, the ocaml xenbus bindings in
xen-unstable.hg:tools/ocaml/libs/xb expose this field and I think
(although I'm not 100% sure) that the oxenstored (tools/ocaml/xenstored)
uses it.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.