[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl/xl: improve behaviour when guest fails to suspend itself



On Tue, 2011-02-08 at 15:06 +0000, Stefano Stabellini wrote:
> On Tue, 8 Feb 2011, Ian Campbell wrote:
>  diff -r deaa7bc1a7ff -r ad6f318aac5d tools/libxl/libxl_dom.c
> > --- a/tools/libxl/libxl_dom.c       Tue Feb 08 09:13:38 2011 +0000
> > +++ b/tools/libxl/libxl_dom.c       Tue Feb 08 13:22:29 2011 +0000
> > @@ -319,7 +319,7 @@ struct suspendinfo {
> >      int suspend_eventchn;
> >      int domid;
> >      int hvm;
> > -    unsigned int flags;
> > +    int guest_responded;
> >  };
> >  
> 
> Even though they are not currently used, I would keep the flags field.

OK.

> > +    /*
> > +     * Guest appear to not be responding, clear the suspend request
> > +     * synchronously using a transaction.
> > +     */
> >      if (!strcmp(state, "suspend")) {
> > -        LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "guest didn't suspend in time");
> > -        libxl__xs_write(si->gc, XBT_NULL, path, "");
> > +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "guest didn't suspend in time, 
> > clearing suspend request");
> > +    retry_transaction:
> > +        t = xs_transaction_start(ctx->xsh);
> > +
> > +        state = libxl__xs_read(si->gc, t, path);
> > +
> > +        libxl__xs_write(si->gc, t, path, "");
> > +
> 
> Doesn't it make sense to clear path only if state is still "suspend"?
> Clearing the path again even if the guest already cleared it may call a
> spurious xenstore watch event in the guest, right?

I guess it's worth avoiding since its pretty trivial to do so, yes.

> > +
> > +    /* Final check, guest may have suspended while we were clearing the 
> > request. */
> > +    if (!strcmp(state, "suspend")) {
> > +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "final check, guest didn't 
> > suspend");
> > +        return 0;
> > +    }
> > +
> > +    si->guest_responded = 1;
> >      return 1;
> >  }
> 
> If "suspend" was gone before we cleared it, shouldn't we check
> xc_domain_getinfolist again before asserting that the domain suspended
> correcty?

Hrm, yes. I think it works to pull that check out of the loop and only
perform it once on the final exit from the function instead of checking
every time around the loop.

This will mean a second timeout loop after the guest acknowledges the
suspend waiting for the actual suspend. I think this makes sense anyway
rather than conflating the two in a single timeout.

> Or maybe is better just to return 0 without the final check, assuming
> that the guest failed in any case.

I think if the guest cleared the xenstore node we need to assume it will
have tried to suspend which is a different failure to not having tried.

If the guest suspends without clearing the xenstore node then all bets
are off anyway IMHO and we shouldn't trust it to have suspended
correctly and the toolstack can't really be expected to recover so
taking the "guest failed to respond" error path is as good as any other.

I'll make these changes and repost.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.