[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)


  • To: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Fri, 1 Apr 2011 14:49:10 +0100
  • Cc: Shriram Rajagopalan <rshriram@xxxxxxxxx>, Brendan Cully <brendan@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 01 Apr 2011 06:49:58 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=Scfrdw1TQMfgftMlxgZnGs6b/ZShGCURYfbKd3okuKuHySn+KFPv7ZmdINkU68W7lz 8ry9RHUW4e5DpclwSyKkrfTOJbia3y7BhivG375TeqkeL8yeg41oTxiBTkAbplQo9Mli vNzpth8nzHElVcWdIreG0cjSGCIRUWETu7s3Y=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Fri, Apr 1, 2011 at 12:58 PM, Ian Campbell
<Ian.Campbell@xxxxxxxxxxxxx> wrote:
> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:
>> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@xxxxxxxxxx> 
>> wrote:
>> > # HG changeset patch
>> > # User Ian Campbell <ian.campbell@xxxxxxxxxx>
>> > # Date 1283766891 -3600
>> > # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
>> > # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
>> > libxc: provide notification of final checkpoint to restore end
>> >
>> > When the restore code sees this notification it will restore the
>> > currently in-progress checkpoint when it completes.
>> >
>> > This allows the restore end to finish up without waiting for a
>> > spurious timeout on the receive fd and thereby avoids unnecessary
>> > error logging in the case of a successful migration or restore.
>> >
>> > In the normal migration or restore case the first checkpoint is always
>> > the last. For a rolling checkpoint (such as Remus) the notification is
>> > currently unused but could be used in the future for example to
>> > provide a controlled failover for reasons other than error
>>
>> Unfortunatley, this introduces a backwards-compatibility problem,
>> since older versions of the tools never send LAST_CHECKPOINT.
>
> Out of interest what actually happens? (I think you mentioned something
> about failing when restoring from a file?)

Steps to reproduce:
* Install last release of XenServer (based on Xen 3.4)
* Install a VM, suspend it.
* Upgrade to development version of XenServer (based on 4.1)
* Attempt to resume VM; resume fails.

libxc output is below:

Apr  1 14:27:29 localhost xenguest: xc: detail: xc_domain_restore
start: p2m_size = fefff
Apr  1 14:27:33 localhost xenguest: xc: error: Max batch size exceeded
(1970103633). Giving up.: Internal error
Apr  1 14:27:33 localhost xenguest: xc: error: error when buffering
batch, finishing (90 = Message too long): Internal error
Apr  1 14:27:33 localhost xenguest: xc: detail: Restore exit with rc=0

I had added debugging to print out the "batch" values read, and
confirmed that this is *after* the page transfer and everything else
was done, and after the "0" entry indicating the end of a checkpoint.
So it's reading through one full checkpoint, and then reading some
garbage value and failing.  Initializing "ctx->last_checkpoint" to 1
makes the problem go away.

Ah -- hmm.  You know what I bet -- xc_domain_save() doesn't write the
qemu checkpoint, but xc_domain_restore() reads it (again because of
REMUS).  Libxl handles this properly, but making xapi write in such a
way that xc_domain_restore() could read it properly turned out to be
non-trivial; making xc_domain_restore() not read the qemu file was the
simplest solution.

But of course, that means that there's still data in the file after
you've finished a "checkpoint", so the next read succeeds in reading
in the qemu save file.  Not surprising that it doesn't look right. :-)

> My intention at the time was that in the absence of a LAST_CHECKPOINT
> things would continue to behave as before e.g. you would see the
> spurious timeout on the receive fd but still restore correctly.
>
> Perhaps the associated changes to the O_NONBLOCK'ness of the fd at
> various stages is what broken this for the non-Remus migrations?
>
>> Would it make sense to invert the logic on this -- i.e., default to
>> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this
>> *not* to be the last (either by expecting the checkpoint() function to
>> do it, or by sending it from xc_domain_save)?
>
> I'd quite like to understand what I broke and investigate the
> trivialness (or otherwise) of a fix, but this change would probably
> independently make sense too.
>
>> Now that 4.1 is out, we have to consider version compatibilities.  But
>> we should be able to make it so that there would only be if:
>> 1) You're running REMUS
>
> Is it the case that Remus only cares about checkpointing between like
> versions of the toolstack?
>
>> 2) One of your toolstacks is 4.1.0, and one is not.
>>
>> That seems like it shouldn't be too bad.
>>
>> Thoughts?
>>
>>  -George
>>
>> >
>> > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>> > Acked-by: Brendan Cully <brendan@xxxxxxxxx>
>> >
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
>> > --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>> > @@ -42,6 +42,7 @@ struct restore_ctx {
>> >     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>> >     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current 
>> > region.  */
>> >     int completed; /* Set when a consistent image is available */
>> > +    int last_checkpoint; /* Set when we should commit to the current 
>> > checkpoint when it completes. */
>> >     struct domain_info_context dinfo;
>> >  };
>> >
>> > @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>> >         // DPRINTF("console pfn location: %llx\n", buf->console_pfn);
>> >         return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> >
>> > +    case XC_SAVE_ID_LAST_CHECKPOINT:
>> > +        ctx->last_checkpoint = 1;
>> > +        // DPRINTF("last checkpoint indication received");
>> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
>> > +
>> >     default:
>> >         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>> >             ERROR("Max batch size exceeded (%d). Giving up.", count);
>> > @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>> >             goto out;
>> >         }
>> >         ctx->completed = 1;
>> > -        /* shift into nonblocking mode for the remainder */
>> > -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > -            flags = 0;
>> > -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +
>> > +        /*
>> > +         * If more checkpoints are expected then shift into
>> > +         * nonblocking mode for the remainder.
>> > +         */
>> > +        if ( !ctx->last_checkpoint )
>> > +        {
>> > +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>> > +                flags = 0;
>> > +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>> > +        }
>> > +    }
>> > +
>> > +    if ( ctx->last_checkpoint )
>> > +    {
>> > +        // DPRINTF("Last checkpoint, finishing\n");
>> > +        goto finish;
>> >     }
>> >
>> >     // DPRINTF("Buffered checkpoint\n");
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
>> > --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>> > @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>> >         }
>> >     }
>> >
>> > +    if ( !callbacks->checkpoint )
>> > +    {
>> > +        /*
>> > +         * If this is not a checkpointed save then this must be the first 
>> > and
>> > +         * last checkpoint.
>> > +         */
>> > +        i = XC_SAVE_ID_LAST_CHECKPOINT;
>> > +        if ( wrexact(io_fd, &i, sizeof(int)) )
>> > +        {
>> > +            PERROR("Error when writing last checkpoint chunk");
>> > +            goto out;
>> > +        }
>> > +    }
>> > +
>> >     /* Zero terminate */
>> >     i = 0;
>> >     if ( wrexact(io_fd, &i, sizeof(int)) )
>> > diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
>> > --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>> > +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>> > @@ -131,6 +131,7 @@
>> >  #define XC_SAVE_ID_TMEM_EXTRA         -6
>> >  #define XC_SAVE_ID_TSC_INFO           -7
>> >  #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
>> > +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after 
>> > completion of current iteration. */
>> >
>> >  /*
>> >  ** We process save/restore/migrate in batches of pages; the below
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-devel
>> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.