[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] fix Remus failover regression




On Jul 28, 2014 12:20 AM, "Hongyang Yang" <yanghy@xxxxxxxxxxxxxx> wrote:
>
>
>
> On 07/28/2014 12:05 PM, Wen Congyang wrote:
>>
>> At 07/28/2014 11:35 AM, Yang Hongyang Write:
>>>
>>> commit: c2ba706c
>>> tools/libxc: goto correct label on error paths by Andrew broke
>>> Remus in Xen 4.4 or earlier versions that has this commit backported.
>>>
>>> With Remus, this jump essentially discards the last incomplete
>>> checkpoint received by the backup.
>>> This is required for Remus to work and this does not break live
>>> migration.
>>>
>>> CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>>> CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> CC: Shriram Rajagopalan <rshriram@xxxxxxxxx>
>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
>>> ---
>>> Â tools/libxc/xc_domain_restore.c | 4 ++--
>>> Â 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
>>> index e73e0a2..5d2fbd6 100644
>>> --- a/tools/libxc/xc_domain_restore.c
>>> +++ b/tools/libxc/xc_domain_restore.c
>>> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>>>
>>> Â Â Â if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>>> Â Â Â Â Â PERROR("error when buffering batch, finishing");
>>> - Â Â Â Âgoto out;
>>> + Â Â Â Âgoto finish;
>>> Â Â Â }
>>> Â Â Â memset(&tmptail, 0, sizeof(tmptail));
>>> Â Â Â tmptail.ishvm = hvm;
>>> Â Â Â if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>>> Â Â Â Â Â Â Â Â Â Â Â Âext_vcpucontext, vcpuextstate_size) < 0 ) {
>>> Â Â Â Â Â ERROR ("error buffering image tail, finishing");
>>> - Â Â Â Âgoto out;
>>> + Â Â Â Âgoto finish;
>>> Â Â Â }
>>> Â Â Â tailbuf_free(&tailbuf);
>>> Â Â Â memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>>>
>>
>> The mail is here:
>> http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html
>>
>>> Both of these errors have been discovered by xc_domain_restore() returning
>>> success after suffering a fatal error during migration, leading to the
>>> toolstack believing that the VM migrated successfully.
>>
>>
>> These codes are only for Remus. So, why these codes are executed by migration?
>

I am not familiar with the XenServer code base. I don't know if it has Remus support. So the xc_domain_restore.c file may or may not be the same between Xen and XenServer. Please correct me if I am wrong.

Also, can those errors encountered in XenServer be reproduced in Xen 4.4?

Finally, goto finish vs goto out can be if-elsed by checking the ctx->complete and the ctx->last_checkpoint variables, to distinguish between a failure during mid-migragion vs mid-checkpoint. I haven't thought about this fully.

>
> I was confused also, without Remus, these two error path will not be hitted I
> think, without Remus, migration will ended at:
> 1776 Â Â if ( ctx->last_checkpoint )
> 1777 Â Â {
> 1778 Â Â Â Â // DPRINTF("Last checkpoint, finishing\n");
> 1779 Â Â Â Â goto finish;
> 1780 Â Â }
>
>>
>> Thanks
>> Wen Congyang
>>
>>
>> .
>>
>
> --
> Thanks,
> Yang.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.