[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4] tools: set migration constraints from cmdline
On Mon, 2013-02-04 at 13:41 +0000, Olaf Hering wrote: > On Mon, Feb 04, Ian Campbell wrote: > > > On Fri, 2013-02-01 at 19:34 +0000, Olaf Hering wrote: > > > # HG changeset patch > > > # User Olaf Hering <olaf@xxxxxxxxx> > > > # Date 1359746832 -3600 > > > # Node ID 785c8f34e1f802106e53ca4d2c54dce55c8ee166 > > > # Parent fd711ebdce9a58556d62c2daaf5d49ab06de4a3c > > > tools: set migration constraints from cmdline > > > > > > Add new options to xm/xl migrate to control the process of migration. > > > The intention is to optionally abort the migration if it takes too long > > > to migrate a busy guest due to the high number of dirty pages. Currently > > > the guest is suspended to transfer the remaining dirty pages. This > > > transfer can take too long, which can confuse the guest if its suspended > > > for too long. > > > > > > -M <number> Number of iterations before final suspend (default: 30) > > > --max_iters <number> > > > > > > -m <factor> Max amount of memory to transfer before final suspend > > > (default: 3*RAM) > > > --max_factor <factor> > > > > > > -N Abort migration instead of doing final suspend. > > > --no_suspend > > > > Is this last one a debug option? (Having read the patch I think not, but > > the description here doesn't really explain it) > > No, its not debug. And the naming of that -N option is not really good, > I agree. > > What it is supposed to do is to avoid the final suspend+migrate+resume > step when either there were too many iterations or too many pages > transfered when the guest continues to dirty many pages. Its not > predictable how long the guest will be suspended to transfer the > remaining pages to the new host. I think even transfering 1GB RAM at > 1000/GBs takes maybe 10 seconds and with large guests there may be > several GB dirty pages, so that a guest may be suspended for a minute. > This caused issues in a customer environment. > > Instead the migration should be aborted and the guest should continue to > run on the old host. That's what I understood from reading the code, can we make the docs say it too ;-) > > > > > =item B<remus> [I<OPTIONS>] I<domain-id> I<host> > > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxc/xc_domain_save.c > > > --- a/tools/libxc/xc_domain_save.c > > > +++ b/tools/libxc/xc_domain_save.c > > > > The changes to this file only seem to implement the -N and not the other > > two? > > xc_domain_save already has max_iters and max_factor as arguments. Ah, I didn't appreciate that. BTW did you want to add a way to override DEF_MIN_REMAINING? > > > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl.h > > > --- a/tools/libxl/libxl.h > > > +++ b/tools/libxl/libxl.h > > > @@ -500,12 +500,25 @@ int libxl_domain_create_restore(libxl_ct > > > void libxl_domain_config_init(libxl_domain_config *d_config); > > > void libxl_domain_config_dispose(libxl_domain_config *d_config); > > > > > > -int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, > > > +int libxl_domain_suspend_0x040200(libxl_ctx *ctx, uint32_t domid, int fd, > > > int flags, /* LIBXL_SUSPEND_* */ > > > const libxl_asyncop_how *ao_how) > > > LIBXL_EXTERNAL_CALLERS_ONLY; > > > +#ifdef LIBXL_API_VERSION > > > +#if LIBXL_API_VERSION == 0x040200 > > > +#define libxl_domain_suspend libxl_domain_suspend_0x040200 > > > > int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, > > int flags, /* LIBXL_SUSPEND_* */ > > int max_iters, int max_factor, > > const libxl_asyncop_how *ao_how) > > LIBXL_EXTERNAL_CALLERS_ONLY; > > #ifdef LIBXL_API_VERSION > > #if LIBXL_API_VERSION == 0x040200 > > #define libxl_domain_suspend(ctx, domid, fd, flags, ao_how) \ > > libxl_domain_suspend(ctx, domid, fd, flags, 0, 0, ao_how) > > #endif /* LIBXL_API_VERSION == 0x040200 */ > > #endif /* defined(LIBXL_API_VERSION) */ > > > > Should work? > > That may work, have to try it. In the end if we make such a change it > would serve as example for other upcoming API changes. Agreed. > > > /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )] > > > * If this parameter is true, use co-operative resume. The guest > > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl_dom.c > > > --- a/tools/libxl/libxl_dom.c > > > +++ b/tools/libxl/libxl_dom.c > > > @@ -1240,6 +1240,7 @@ void libxl__domain_suspend(libxl__egc *e > > > > > > dss->xcflags = (live ? XCFLAGS_LIVE : 0) > > > | (debug ? XCFLAGS_DEBUG : 0) > > > + | (dss->xlflags & LIBXL_SUSPEND_NO_FINAL_SUSPEND ? > > > XCFLAGS_DOMSAVE_NOSUSPEND : 0) > > > | (dss->hvm ? XCFLAGS_HVM : 0); > > > > > > dss->suspend_eventchn = -1; > > > > > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/xl_cmdimpl.c > > > --- a/tools/libxl/xl_cmdimpl.c > > > +++ b/tools/libxl/xl_cmdimpl.c > > > @@ -3172,7 +3172,7 @@ static int save_domain(uint32_t domid, c > > > > > > save_domain_core_writeconfig(fd, filename, config_data, config_len); > > > > > > - int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL); > > > + int rc = libxl_domain_suspend(ctx, domid, fd, 0, 0, 0, NULL); > > > > Doesn't this need to pass down the selected values? > > save_domain is different from migrate_domain, I think xl save will > suspend the guest anyway? Oh yes, I didn't spot this as the save path (despite it being right there in the function name). > But I notice the -c option for "save", so > perhaps it would be useful to catch busy guests as well here? I'm not really sure what the usecase is for save -c. I'd be a bit inclined to leave it until someone wants it I think. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |