[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] libxl: error handling before xenstored runs
On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > On 10/02/11 08:55, Ian Campbell wrote: > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > someone is listening on the other end so you have no choice but to try > > > communicating and see if anyone responds. > > > > > > It's a pretty straightforward bug that the kernel does the waiting to > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > quick look a little while ago but unfortunately it didn't look > > > straightforward to fix on the kernel side :-( I can't remember why > > > though. > > > > For starter, the protocol requires the messages to sit on the ring for a > > underdetermined amount of time (boot watches). > > > > > It might be simpler to support allowing the userspace client to > > > explicitly specify a timeout. I'm not sure what the impact on the ring > > > is of leaving unconsumed requests on the ring when the other end does > > > show up. Presumably the kernel driver just needs to be prepared to > > > swallow responses whose target has given up and gone home. > > > > No, the simplest thing to do is to use the socket connection > > exclusively. Just how we're doing it in XCP and XCI. > > Right but this approach doesn't work with xenstored in a stubdomain. > Part of the point of using the ring protocol even when this isn't the > case is to help ensure that it is possible and help avoid regressions > etc. > > > The protocol is not design to do async either, so leaving unconsumed > > request, could be pretty disastrous if the other end show up. Providing > > the kernel doesn't detect it (i don't think it does [1]), it would imply > > spurious reply, for example the previous waiting read on "/abc/def" > > could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that's just an > implementation detail ;-) Currently the kernel does (roughly): > take_lock > write_request > wait_for_reply > release_lock > instead it should/could be: > take_lock(timeout) > write_request (++req_id) > while read_reply.req_id != req_id && not (timeout) > wait some more > release lock I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. Christoph > OK, so may be this is not in the "might be simpler" bucket any more, but > it sounds like plausibly the right direction to take. > > Properly handling multiple userspace clients asynchronously a demuxes > the responses etc would be even better but I don't think necessary to > solve this particular issue. > > > > Maybe we should add an explicit ping/pong ring message to the xs ring > > > protocol? > > > > And who's going to reply to this if xenstored is missing ? you would > > require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-) > > Ian. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |