[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] tools: set migration constraints from cmdline



On Mon, Feb 04, Ian Campbell wrote:

> On Fri, 2013-02-01 at 19:34 +0000, Olaf Hering wrote:
> > # HG changeset patch
> > # User Olaf Hering <olaf@xxxxxxxxx>
> > # Date 1359746832 -3600
> > # Node ID 785c8f34e1f802106e53ca4d2c54dce55c8ee166
> > # Parent  fd711ebdce9a58556d62c2daaf5d49ab06de4a3c
> > tools: set migration constraints from cmdline
> > 
> > Add new options to xm/xl migrate to control the process of migration.
> > The intention is to optionally abort the migration if it takes too long
> > to migrate a busy guest due to the high number of dirty pages. Currently
> > the guest is suspended to transfer the remaining dirty pages. This
> > transfer can take too long, which can confuse the guest if its suspended
> > for too long.
> > 
> > -M <number>   Number of iterations before final suspend (default: 30)
> > --max_iters <number>
> > 
> > -m <factor>   Max amount of memory to transfer before final suspend 
> > (default: 3*RAM)
> > --max_factor <factor>
> > 
> > -N            Abort migration instead of doing final suspend.
> > --no_suspend
> 
> Is this last one a debug option? (Having read the patch I think not, but
> the description here doesn't really explain it)

No, its not debug. And the naming of that -N option is not really good,
I agree.

What it is supposed to do is to avoid the final suspend+migrate+resume
step when either there were too many iterations or too many pages
transfered when the guest continues to dirty many pages. Its not
predictable how long the guest will be suspended to transfer the
remaining pages to the new host. I think even transfering 1GB RAM at
1000/GBs takes maybe 10 seconds and with large guests there may be
several GB dirty pages, so that a guest may be suspended for a minute.
This caused issues in a customer environment.

Instead the migration should be aborted and the guest should continue to
run on the old host.


> >  =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
> > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxc/xc_domain_save.c
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> 
> The changes to this file only seem to implement the -N and not the other
> two?

xc_domain_save already has max_iters and max_factor as arguments.

> > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl.h
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -500,12 +500,25 @@ int libxl_domain_create_restore(libxl_ct
> >  void libxl_domain_config_init(libxl_domain_config *d_config);
> >  void libxl_domain_config_dispose(libxl_domain_config *d_config);
> > 
> > -int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
> > +int libxl_domain_suspend_0x040200(libxl_ctx *ctx, uint32_t domid, int fd,
> >                           int flags, /* LIBXL_SUSPEND_* */
> >                           const libxl_asyncop_how *ao_how)
> >                           LIBXL_EXTERNAL_CALLERS_ONLY;
> > +#ifdef LIBXL_API_VERSION
> > +#if LIBXL_API_VERSION == 0x040200
> > +#define libxl_domain_suspend libxl_domain_suspend_0x040200
> 
> int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
>                          int flags, /* LIBXL_SUSPEND_* */
>                          int max_iters, int max_factor,
>                          const libxl_asyncop_how *ao_how)
>                          LIBXL_EXTERNAL_CALLERS_ONLY;
> #ifdef LIBXL_API_VERSION
> #if LIBXL_API_VERSION == 0x040200
> #define libxl_domain_suspend(ctx, domid, fd, flags, ao_how) \
>             libxl_domain_suspend(ctx, domid, fd, flags, 0, 0, ao_how) 
> #endif /* LIBXL_API_VERSION == 0x040200 */
> #endif /* defined(LIBXL_API_VERSION) */
> 
> Should work?

That may work, have to try it. In the end if we make such a change it
would serve as example for other upcoming API changes.

> >  #define LIBXL_SUSPEND_DEBUG 1
> >  #define LIBXL_SUSPEND_LIVE 2
> > +#define LIBXL_SUSPEND_NO_FINAL_SUSPEND 4
> 
> (These should really be in the IDL, but this is a pre-existing
> condition)
> 
> The name of this new option doesn't quite describe what it does, since
> it doesn't disable the final suspend always, only under certain
> condition.
> 
> LIBXL_SUSPEND_ABORT_ON_<xxx>
> 
> Where the <xxx> is tricky ;-)
> 
> <xxx> == DIRTYING_TOO_FAST
> <xxx> == GUEST_TOO_BUSY
> <xxx> == YOUR_SUGGESTIONS_HERE

Thats alot more descriptive, thanks. DIRTYING_TOO_FAST describes it well
I think.


> >  /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
> >   *   If this parameter is true, use co-operative resume. The guest
> > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/libxl_dom.c
> > --- a/tools/libxl/libxl_dom.c
> > +++ b/tools/libxl/libxl_dom.c
> > @@ -1240,6 +1240,7 @@ void libxl__domain_suspend(libxl__egc *e
> > 
> >      dss->xcflags = (live ? XCFLAGS_LIVE : 0)
> >            | (debug ? XCFLAGS_DEBUG : 0)
> > +          | (dss->xlflags & LIBXL_SUSPEND_NO_FINAL_SUSPEND ? 
> > XCFLAGS_DOMSAVE_NOSUSPEND : 0)
> >            | (dss->hvm ? XCFLAGS_HVM : 0);
> > 
> >      dss->suspend_eventchn = -1;
> 
> > diff -r fd711ebdce9a -r 785c8f34e1f8 tools/libxl/xl_cmdimpl.c
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -3172,7 +3172,7 @@ static int save_domain(uint32_t domid, c
> > 
> >      save_domain_core_writeconfig(fd, filename, config_data, config_len);
> > 
> > -    int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL);
> > +    int rc = libxl_domain_suspend(ctx, domid, fd, 0, 0, 0, NULL);
> 
> Doesn't this need to pass down the selected values?

save_domain is different from migrate_domain, I think xl save will
suspend the guest anyway? But I notice the -c option for "save", so
perhaps it would be useful to catch busy guests as well here?


Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.