[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LAST_CHECKPOINT and save/restore/migrate compatibility (was Re: [Xen-devel] [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end)



On 2011-04-01, at 6:58 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx> wrote:

> On Fri, 2011-04-01 at 12:02 +0100, George Dunlap wrote:
>> On Mon, Sep 6, 2010 at 11:03 AM, Ian Campbell <ian.campbell@xxxxxxxxxx> 
>> wrote:
>>> # HG changeset patch
>>> # User Ian Campbell <ian.campbell@xxxxxxxxxx>
>>> # Date 1283766891 -3600
>>> # Node ID bdf8ce09160d715451e1204babe5f80886ea6183
>>> # Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
>>> libxc: provide notification of final checkpoint to restore end
>>> 
>>> When the restore code sees this notification it will restore the
>>> currently in-progress checkpoint when it completes.
>>> 
>>> This allows the restore end to finish up without waiting for a
>>> spurious timeout on the receive fd and thereby avoids unnecessary
>>> error logging in the case of a successful migration or restore.
>>> 
>>> In the normal migration or restore case the first checkpoint is always
>>> the last. For a rolling checkpoint (such as Remus) the notification is
>>> currently unused but could be used in the future for example to
>>> provide a controlled failover for reasons other than error
>> 
>> Unfortunatley, this introduces a backwards-compatibility problem,
>> since older versions of the tools never send LAST_CHECKPOINT.
> 
> Out of interest what actually happens? (I think you mentioned something
> about failing when restoring from a file?)
> 
> My intention at the time was that in the absence of a LAST_CHECKPOINT
> things would continue to behave as before e.g. you would see the
> spurious timeout on the receive fd but still restore correctly.
> 
> Perhaps the associated changes to the O_NONBLOCK'ness of the fd at
> various stages is what broken this for the non-Remus migrations?
> 
>> Would it make sense to invert the logic on this -- i.e., default to
>> only one checkpoint, and send "MORE_CHECKPOINTS" if it expects this
>> *not* to be the last (either by expecting the checkpoint() function to
>> do it, or by sending it from xc_domain_save)?
> 
> I'd quite like to understand what I broke and investigate the
> trivialness (or otherwise) of a fix, but this change would probably
> independently make sense too.
> 
>> Now that 4.1 is out, we have to consider version compatibilities.  But
>> we should be able to make it so that there would only be if:
>> 1) You're running REMUS
> 
> Is it the case that Remus only cares about checkpointing between like
> versions of the toolstack?
> 
Can you please elaborate this statement ?

Shriram
>> 2) One of your toolstacks is 4.1.0, and one is not.
>> 
>> That seems like it shouldn't be too bad.
>> 
>> Thoughts?
>> 
>> -George
>> 
>>> 
>>> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>>> Acked-by: Brendan Cully <brendan@xxxxxxxxx>
>>> 
>>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
>>> --- a/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>>> +++ b/tools/libxc/xc_domain_restore.c   Mon Sep 06 10:54:51 2010 +0100
>>> @@ -42,6 +42,7 @@ struct restore_ctx {
>>>   xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
>>>   xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region.  
>>> */
>>>   int completed; /* Set when a consistent image is available */
>>> +    int last_checkpoint; /* Set when we should commit to the current 
>>> checkpoint when it completes. */
>>>   struct domain_info_context dinfo;
>>> };
>>> 
>>> @@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
>>>       // DPRINTF("console pfn location: %llx\n", buf->console_pfn);
>>>       return pagebuf_get_one(xch, ctx, buf, fd, dom);
>>> 
>>> +    case XC_SAVE_ID_LAST_CHECKPOINT:
>>> +        ctx->last_checkpoint = 1;
>>> +        // DPRINTF("last checkpoint indication received");
>>> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
>>> +
>>>   default:
>>>       if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>>>           ERROR("Max batch size exceeded (%d). Giving up.", count);
>>> @@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
>>>           goto out;
>>>       }
>>>       ctx->completed = 1;
>>> -        /* shift into nonblocking mode for the remainder */
>>> -        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>>> -            flags = 0;
>>> -        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>>> +
>>> +        /*
>>> +         * If more checkpoints are expected then shift into
>>> +         * nonblocking mode for the remainder.
>>> +         */
>>> +        if ( !ctx->last_checkpoint )
>>> +        {
>>> +            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
>>> +                flags = 0;
>>> +            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
>>> +        }
>>> +    }
>>> +
>>> +    if ( ctx->last_checkpoint )
>>> +    {
>>> +        // DPRINTF("Last checkpoint, finishing\n");
>>> +        goto finish;
>>>   }
>>> 
>>>   // DPRINTF("Buffered checkpoint\n");
>>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
>>> --- a/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>>> +++ b/tools/libxc/xc_domain_save.c      Mon Sep 06 10:54:51 2010 +0100
>>> @@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
>>>       }
>>>   }
>>> 
>>> +    if ( !callbacks->checkpoint )
>>> +    {
>>> +        /*
>>> +         * If this is not a checkpointed save then this must be the first 
>>> and
>>> +         * last checkpoint.
>>> +         */
>>> +        i = XC_SAVE_ID_LAST_CHECKPOINT;
>>> +        if ( wrexact(io_fd, &i, sizeof(int)) )
>>> +        {
>>> +            PERROR("Error when writing last checkpoint chunk");
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>>   /* Zero terminate */
>>>   i = 0;
>>>   if ( wrexact(io_fd, &i, sizeof(int)) )
>>> diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
>>> --- a/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>>> +++ b/tools/libxc/xg_save_restore.h     Mon Sep 06 10:54:51 2010 +0100
>>> @@ -131,6 +131,7 @@
>>> #define XC_SAVE_ID_TMEM_EXTRA         -6
>>> #define XC_SAVE_ID_TSC_INFO           -7
>>> #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
>>> +#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after 
>>> completion of current iteration. */
>>> 
>>> /*
>>> ** We process save/restore/migrate in batches of pages; the below
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>>> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.