[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save



On 16/07/14 16:22, Shriram Rajagopalan wrote:



On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@xxxxxxxxxxxxxx> wrote:
implement remus checkpoint in v2 save

Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
---
Âtools/libxc/saverestore/common.h | Â1 +
Âtools/libxc/saverestore/save.c  | 88 ++++++++++++++++++++++++----------------
Â2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 24ba95b..1dd9f51 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -153,6 +153,7 @@ struct xc_sr_context

  Âxc_dominfo_t dominfo;
  Âbool checkpointed;
+ Â Âbool firsttime;

  Âunion
  Â{
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index d2fa8a6..98a5c2f 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
    Âgoto out;
  Â}

+ Â Âif ( ctx->checkpointed && !ctx->firsttime )
+ Â Â Â Âgoto lastiter;

nit: last_iter

I suggest adding a comment before this code block to explain what has happened soÂ
far above why we are jumping to the last_iter block skipping the rest of the stuff below.

I also suggest maintaining some stats structure somewhere (number of dirty pages,Â
time when checkpoint was initiated, etc.). They could be simply debug prints that
can be enabled on demand.

A better alternative would be to somehow pass this stats structure back to libxl,
such that the user can enable/disable stats printing using the xl remus command
for example.

This is partly my fault. I so far havenât got any of the "progress report" set of logging functionality wired up. Once that is done, libxl should be fine, as it can already filter the progress logging from the regular logging.

Â
  Â/* This juggling is required if logdirty is already on, e.g. VRAM tracking */
  Âif ( xc_shadow_control(xch, ctx->domid,
              XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
@@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
      Âbreak;
  Â}

+lastiter:
  Ârc = suspend_domain(ctx);
  Âif ( rc )
    Âgoto out;

I hope the postsuspend callbacks are called somewhere?
Â
@@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
  Âif ( rc )
    Âgoto err;

- Â Ârc = ctx->save.ops.start_of_stream(ctx);
- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Âdo {
+ Â Â Â Ârc = ctx->save.ops.start_of_stream(ctx);
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;

- Â Âif ( ctx->save.live )
- Â Â{
- Â Â Â ÂDPRINTF("Starting live migrate");
- Â Â Â Ârc = send_domain_memory_live(ctx);
- Â Â}
- Â Âelse
- Â Â{
- Â Â Â ÂDPRINTF("Starting nonlive save");
- Â Â Â Ârc = send_domain_memory_nonlive(ctx);
- Â Â}
+ Â Â Â Âif ( ctx->save.live )
+ Â Â Â Â{
+ Â Â Â Â Â ÂDPRINTF("Starting live migrate");

I suggest using the ctx->firsttime bool var before this printf, so thatÂ
when debug print is enabled under remus, the console is not
flooded with the same statement that makes no sense past the
first iteration.

A thought has just crossed my mind. It might be better to have a "send_domain_remus()" function which internally deals with the iterations and looping etc.

e.g.

if ( remus )
ÂÂÂ rc = send_domain_remus(ctx);
else if ( ctx->save.live )
ÂÂÂ rc = send_domain_live(ctx);
else
ÂÂÂ rc = send_domain_memory_nonlive(ctx);

In my upcoming series which fixes the deferred vcpu state problem I have split some of the common bits out of send_domain_memory_{live,nonline}() which might be useful.

Â
+ Â Â Â Â Â Ârc = send_domain_memory_live(ctx);
+ Â Â Â Â}
+ Â Â Â Âelse
+ Â Â Â Â{
+ Â Â Â Â Â ÂDPRINTF("Starting nonlive save");
+ Â Â Â Â Â Ârc = send_domain_memory_nonlive(ctx);
+ Â Â Â Â}

- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;

- Â Â/* Refresh domain information now it has paused. */
- Â Âif ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
- Â Â Â Â (ctx->dominfo.domid != ctx->domid) )
- Â Â{
- Â Â Â ÂPERROR("Unable to refresh domain information");
- Â Â Â Ârc = -1;
- Â Â Â Âgoto err;
- Â Â}
- Â Âelse if ( (!ctx->dominfo.shutdown ||
- Â Â Â Â Â Â Â ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
- Â Â Â Â Â Â Â!ctx->dominfo.paused )
- Â Â{
- Â Â Â ÂERROR("Domain has not been suspended");
- Â Â Â Ârc = -1;
- Â Â Â Âgoto err;
- Â Â}
+ Â Â Â Â/* Refresh domain information now it has paused. */
+ Â Â Â Âif ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
+ Â Â Â Â Â Â (ctx->dominfo.domid != ctx->domid) )
+ Â Â Â Â{
+ Â Â Â Â Â ÂPERROR("Unable to refresh domain information");
+ Â Â Â Â Â Ârc = -1;
+ Â Â Â Â Â Âgoto err;
+ Â Â Â Â}
+ Â Â Â Âelse if ( (!ctx->dominfo.shutdown ||
+ Â Â Â Â Â Â Â Â Âctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
+ Â Â Â Â Â Â Â Â Â!ctx->dominfo.paused )
+ Â Â Â Â{
+ Â Â Â Â Â ÂERROR("Domain has not been suspended");
+ Â Â Â Â Â Ârc = -1;
+ Â Â Â Â Â Âgoto err;
+ Â Â Â Â}


A silly question: Shouldn't this check for 'suspended status' be before the call toÂ
send_domain_memory_live() [under remus]. I am assuming that the
send_domain_memory_live() function is the one that sends the dirty page data
out on the wire.

Even for non-live migrates, we have to ensure that the VM has entered the suspend state. A PV guest cannot possibly recover from resume if it didn't voluntarily suspend as it will loose its reference to the start info page (whos mfn lives in vcpu0.edx for the duration of the migrate and must be updated on the receiving side). Any VM which doesn't sufficiently quiesce its fronends risks ring and memory corruption on resume.

In this case, the send_domain_memory_XXX functions do take care of pausing the domain at the appropriate time, but this here is a sanity check.

~Andrew


Â
- Â Ârc = ctx->save.ops.end_of_stream(ctx);
- Â Âif ( rc )
- Â Â Â Âgoto err;
+ Â Â Â Ârc = ctx->save.ops.end_of_stream(ctx);
+ Â Â Â Âif ( rc )
+ Â Â Â Â Â Âgoto err;
+
+ Â Â Â Âif ( ctx->checkpointed ) {
+ Â Â Â Â Â Âif ( ctx->firsttime )
+ Â Â Â Â Â Â Â Âctx->firsttime = false;
+
+ Â Â Â Â Â Âctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+ Â Â Â Â Â Ârc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+ Â Â Â Â Â Âif ( rc > 0 ) {
+ Â Â Â Â Â Â Â ÂIPRINTF("Next checkpoint\n");

I suggest maintaining checkpoint numbers instead. Much more helpful. Probably even add
a gettimeofday call to print out the time the new checkpoint is started. Once again, its useful
to be able to control this printout from libxl

+ Â Â Â Â Â Â} else {
+ Â Â Â Â Â Â Â Âctx->checkpointed = false;
+ Â Â Â Â Â Â}
+ Â Â Â Â}
+ Â Â} while ( ctx->checkpointed );

  Ârc = write_end_record(ctx);
  Âif ( rc )
@@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
  Âctx.save.live Â= !!(flags & XCFLAGS_LIVE);
  Âctx.save.debug = !!(flags & XCFLAGS_DEBUG);
  Âctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+ Â Âctx.firsttime = true;

  Âif ( ctx.checkpointed ) {
    Â/* This is a checkpointed save, we need these callbacks */
--
1.9.1



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.