[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [PATCH] libxl: do slow resume after failed migration attempt
On Wed, 2011-02-16 at 11:49 +0000, Ian Campbell wrote: > On Wed, 2011-02-16 at 11:47 +0000, Ian Campbell wrote: > > # HG changeset patch > > # User Ian Campbell <ian.campbell@xxxxxxxxxx> > > # Date 1297856874 0 > > # Node ID 1728ed4bbec9e82ca13c2639c8e4ef8b4dc231b6 > > # Parent aa466613328f5de78fdfc968473cb06e948c1f5d > > libxl: do slow resume after failed migration attempt > > > > both of the current callers for libxl_domain_resume are calling after > > a migration has failed, one is failure to suspend on the sender and > > the other is failure to start on the destination, both leading to a > > resume attempt on the sender. > > > > However in the first case, failure to suspend, there is no guarantee > > that the guest has made it as far as the suspend hypercall and > > therefore the fast resume method, which frobs the hypercall return to > > indicate a cancelled suspend, cannot safely be used since it will > > corrupt %eax/%rax. > > > > For the second case, failure to start on destination, I don't think it > > really matters if the resume is fast or slow. > > > > Therefore always use the slow/uncooperative version of xc_domain_resume from > > libxl_domain_resume. > > > > This makes a PV domain which failed to suspend (e.g. because the core > > Linux PM infrastructure within the guest didn't allow it) recover > > gracefully. > > a PVHVM domain never suffered from this because libxl_domain_resume > bails due to a libxl__domain_is_hvm check. I'm not 100% clear whether > this is correct but I didn't change it. My test with a PVHVM guest which > acknowledges the suspend but doesn't go on to do anything seems to work. Looking closer, even a PV guest which is hacked to not actually try to suspend fails this new xc_domain_resume call and it's actually the original domain which continues. I'm inclined to suggest that this is OK and that trying to do a slow xc_domain_resume will save guests which have suffered certain types of failure and be harmless for other types of failures, but I wouldn't argue strongly against a suggestion that the right thing to do in the "failed to suspend" case is to simply unpause the original domain and let it try and continue... Ian. > > Ian. > > > > > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> > > > > diff -r aa466613328f -r 1728ed4bbec9 tools/libxl/libxl.c > > --- a/tools/libxl/libxl.c Tue Feb 15 13:40:50 2011 +0000 > > +++ b/tools/libxl/libxl.c Wed Feb 16 11:47:54 2011 +0000 > > @@ -226,7 +226,7 @@ int libxl_domain_resume(libxl_ctx *ctx, > > rc = ERROR_NI; > > goto out; > > } > > - if (xc_domain_resume(ctx->xch, domid, 1)) { > > + if (xc_domain_resume(ctx->xch, domid, 0)) { > > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, > > "xc_domain_resume failed for domain %u", > > domid); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |