[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] fix Remus failover regression



On 28/07/14 05:03, Yang Hongyang wrote:
> commit: c2ba706c
> tools/libxc: goto correct label on error paths by Andrew Cooper
> broke Remus in Xen 4.4 or earlier versions that has this commit
> backported.

My appologies for breaking Remus. (it just goes to show how fragile this
code is).

>
> With Remus, this jump essentially discards the current incomplete
> checkpoint received by the backup and restore backup from the
> last complete checkpoint.
> This is required for Remus to work and this does not break live
> migration.
> It has been around since Xen 4.0.

However, it is a genuine bugfix for regular migration, so simply
reverting it as this patch does is not appropriate.

For regular migration, you absolutely have to goto out; on a failure
otherwise the finish code will run and declare the migration a success
despite only having half a domain restored.

You need something like:

if ( !checkpointed_stream )
    goto err;

/* Remus comment */
goto finish;

to deal with the different error handing requirements of remus and
regular streams.

~Andrew

>
> CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Shriram Rajagopalan <rshriram@xxxxxxxxx>
> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> ---
>  tools/libxc/xc_domain_restore.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index e73e0a2..b9a56d5 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> uint32_t dom,
>  
>      if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>          PERROR("error when buffering batch, finishing");
> -        goto out;
> +        /*
> +         * Remus: discard the current incomplete checkpoint and restore
> +         * backup from the last complete checkpoint.
> +         */
> +        goto finish;
>      }
>      memset(&tmptail, 0, sizeof(tmptail));
>      tmptail.ishvm = hvm;
>      if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>                       ext_vcpucontext, vcpuextstate_size) < 0 ) {
>          ERROR ("error buffering image tail, finishing");
> -        goto out;
> +        /*
> +         * Remus: discard the current incomplete checkpoint and restore
> +         * backup from the last complete checkpoint.
> +         */
> +        goto finish;
>      }
>      tailbuf_free(&tailbuf);
>      memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>  
>      goto loadpages;
>  
> +  /* With Remus: restore from last complete checkpoint */
>    finish:
>      if ( hvm )
>          goto finish_hvm;


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.