[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 08/20] libxl/migration: add precopy tuning parameters



On Thu, Mar 30, 2017 at 02:03:29AM -0400, Joshua Otto wrote:
> On Wed, Mar 29, 2017 at 10:08:02PM +0100, Andrew Cooper wrote:
> > On 27/03/17 10:06, Joshua Otto wrote:
> > > In the context of the live migration algorithm, the precopy iteration
> > > count refers to the number of page-copying iterations performed prior to
> > > the suspension of the guest and transmission of the final set of dirty
> > > pages.  Similarly, the precopy dirty threshold refers to the dirty page
> > > count below which we judge it more profitable to proceed to
> > > stop-and-copy rather than continue with the precopy.  These would be
> > > helpful tuning parameters to work with when migrating particularly busy
> > > guests, as they enable an administrator to reap the available benefits
> > > of the precopy algorithm (the transmission of guest pages _not_ in the
> > > writable working set can be completed without guest downtime) while
> > > reducing the total amount of time required for the migration (as
> > > iterations of the precopy loop that will certainly be redundant can be
> > > skipped in favour of an earlier suspension).
> > >
> > > To expose these tuning parameters to users:
> > > - introduce a new libxl API function, libxl_domain_live_migrate(),
> > >   taking the same parameters as libxl_domain_suspend() _and_
> > >   precopy_iterations and precopy_dirty_threshold parameters, and
> > >   consider these parameters in the precopy policy
> > >
> > >   (though a pair of new parameters on their own might not warrant an
> > >   entirely new API function, it is added in anticipation of a number of
> > >   additional migration-only parameters that would be cumbersome on the
> > >   whole to tack on to the existing suspend API)
> > >
> > > - switch xl migrate to the new libxl_domain_live_migrate() and add new
> > >   --postcopy-iterations and --postcopy-threshold parameters to pass
> > >   through
> > >
> > > Signed-off-by: Joshua Otto <jtotto@xxxxxxxxxxxx>
> > 
> > This will have to defer to the tools maintainers, but I purposefully
> > didn't expose these knobs to users when rewriting live migration,
> > because they cannot be meaningfully chosen by anyone outside of a
> > testing scenario.  (That is not to say they aren't useful for testing
> > purposes, but I didn't upstream my version of this patch.)
> 
> Ahhh, I wondered why those parameters to xc_domain_save() were present
> but ignored.  That's reasonable.
> 
> I guess the way I had imagined an administrator using them would be in a
> non-production/test environment - if they could run workloads
> representative of their production application in this environment, they
> could experiment with different --precopy-iterations and
> --precopy-threshold values (having just a high-level understanding of
> what they control) and choose the ones that result in the best outcome
> for later use in production.
> 

Running in a test environment isn't always an option -- think about
public cloud providers who don't have control over the VMs or the
workload.

> > I spent quite a while wondering how best to expose these tunables in a
> > way that end users could sensibly use them, and the best I came up with
> > was this:
> > 
> > First, run the guest under logdirty for a period of time to establish
> > the working set, and how steady it is.  From this, you have a baseline
> > for the target threshold, and a plausible way of estimating the
> > downtime.  (Better yet, as XenCenter, XenServers windows GUI, has proved
> > time and time again, users love graphs!  Even if they don't necessarily
> > understand them.)
> > 
> > From this baseline, the conditions you need to care about are the rate
> > of convergence.  On a steady VM, you should converge asymptotically to
> > the measured threshold, although on 5 or fewer iterations, the
> > asymptotic properties don't appear cleanly.  (Of course, the larger the
> > VM, the more iterations, and the more likely to spot this.)
> > 
> > Users will either care about the migration completing successfully, or
> > avoiding interrupting the workload.  The majority case would be both,
> > but every user will have one of these two options which is more
> > important than the other.  As a result, there need to be some options to
> > cover "if $X happens, do I continue or abort".
> > 
> > The case where the VM becomes more busy is harder however.  For the
> > users which care about not interrupting the workload, there will be a
> > point above which they'd prefer to abort the migration rather than
> > continue it.  For the users which want the migration to complete, they'd
> > prefer to pause the VM and take a downtime hit, rather than aborting.
> > 
> > Therefore, you really need two thresholds; the one above which you
> > always abort, the one where you would normally choose to pause.  The
> > decision as to what to do depends on where you are between these
> > thresholds when the dirty state converges.  (Of course, if the VM
> > suddenly becomes more idle, it is sensible to continue beyond the lower
> > threshold, as it will reduce the downtime.)  The absolute number of
> > iterations on the other hand doesn't actually matter from a users point
> > of view, so isn't a useful control to have.
> > 
> > Another thing to be careful with is the measure of convergence with
> > respect to guest busyness, and other factors influencing the absolute
> > iteration time, such as congestion of the network between the two
> > hosts.  I haven't yet come up with a sensible way of reconciling this
> > with the above, in a way which can be expressed as a useful set of controls.
> > 

My thought as well.

> > 
> > The plan, following migration v2, was always to come back to this and
> > see about doing something better than the current hard coded parameters,
> > but I am still working on fixing migration in other areas (not having
> > VMs crash when moving, because they observe important differences in the
> > hardware).
> 
> I think a good strategy would be to solicit three parameters from the
> user:
> - the precopy duration they're willing to tolerate
> - the downtime duration they're willing to tolerate
> - the bandwidth of the link between the hosts (we could try and estimate
>   it for them but I'd rather just make them run iperf)
> 
> Then, after applying this patch, alter the policy so that precopy simply
> runs for the duration that the user is willing to wait.  After that,
> using the bandwidth estimate, compute the approximate downtime required
> to transfer the final set of dirty-pages.  If this is less than what the
> user indicated is acceptable, proceed with the stop-and-copy - otherwise
> abort.
> 
> This still requires the user to figure out for themselves how long their
> workload can really wait, but hopefully they already had some idea
> before deciding to attempt live migration in the first place.
> 

I am not entirely sure what to make of this. I'm not convinced using
durations would cover all cases, but I can't come up with a counter
example that doesn't sound contrived.

Given this series is already complex enough, I think we should set this
aside for another day.

How hard would it be to _not_ include all the knobs in this series?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.