[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] repeated live migration for VM failed



George, 
The live migrate pass over 500+ times with this patch, I think it's fine to 
merge it into Xen 4.9. 

Tested-by: Xudong Hao <xudong.hao@xxxxxxxxx>

Thanks,
-Xudong


> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Hao,
> Xudong
> Sent: Tuesday, May 23, 2017 5:23 PM
> To: George Dunlap <george.dunlap@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxx
> Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx>; Julien Grall <julien.grall@xxxxxxx>; Paul
> Durrant <paul.durrant@xxxxxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx>; Gao,
> Chao <chao.gao@xxxxxxxxx>
> Subject: Re: [Xen-devel] [BUG] repeated live migration for VM failed
> 
> George, thanks the fixing.
> With the patch, the testing is running on 90+ time LM without any error till 
> now,
> let's wait for the final result.
> 
> Thanks,
> -Xudong
> 
> 
> > -----Original Message-----
> > From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
> > Sent: Monday, May 22, 2017 7:03 PM
> > To: Hao, Xudong <xudong.hao@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx
> > Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall
> > <julien.grall@xxxxxxx>; Gao, Chao <chao.gao@xxxxxxxxx>; Paul Durrant
> > <paul.durrant@xxxxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>;
> > Jan Beulich <JBeulich@xxxxxxxx>
> > Subject: Re: [Xen-devel] [BUG] repeated live migration for VM failed
> >
> > On Mon, May 22, 2017 at 11:18 AM, George Dunlap
> > <george.dunlap@xxxxxxxxxx>
> > wrote:
> > > On 22/05/17 07:35, Hao, Xudong wrote:
> > >> Bug detailed description:
> > >>
> > >> ----------------
> > >>
> > >> Create one RHEL7.3 HVM and do live migration continuously, while
> > >> doing the
> > 200+ or 300+ times live-migration, tool stack report error and migration 
> > failed.
> > >>
> > >>
> > >>
> > >> Environment :
> > >>
> > >> ----------------
> > >>
> > >> HW: Skylake server
> > >>
> > >> Xen: Xen 4.9.0 RC4
> > >>
> > >> Dom0: Linux 4.11.0
> > >>
> > >>
> > >>
> > >> Reproduce steps:
> > >>
> > >> ----------------
> > >>
> > >> 1.      Compile Xen 4.9 Rc4 and dom0 kernel 4.11.0, boot to dom0
> > >>
> > >> 2.      Boot RHEL7.3 HVM guest
> > >>
> > >> 3.      Migrate guest to localhost, sleep 10 seconds
> > >>
> > >> 4.      Repeat doing the step3.
> > >>
> > >>
> > >>
> > >> Current result:
> > >>
> > >> ----------------
> > >>
> > >> VM Migration fail.
> > >>
> > >>
> > >>
> > >> Base error log:
> > >>
> > >> ----------------
> > >>
> > >> xl migrate 24hrs_lm_guest_2 localhost
> > >>
> > >> root@localhost's password:
> > >>
> > >> migration target: Ready to receive domain.
> > >>
> > >> Saving to migration stream new xl format (info 0x3/0x0/1761)
> > >>
> > >> Loading new save file <incoming migration stream> (new xl fmt info
> > >> 0x3/0x0/1761)
> > >>
> > >> Savefile contains xl domain config in JSON format
> > >>
> > >> Parsing config from <saved>
> > >>
> > >> xc: info: Saving domain 273, type x86 HVM
> > >>
> > >> xc: info: Found x86 HVM domain from Xen 4.9
> > >>
> > >> xc: info: Restoring domain
> > >>
> > >> xc: error: set HVM param 12 = 0x00000000feffe000 (85 = Interrupted
> > >> system call should ): Internal error
> > >>
> > >> xc: error: Restore failed (85 = Interrupted system call should ):
> > >> Internal error
> > >
> > > Interesting -- it appears that setting HVM_PARAM_IDENT_PT (#12) can
> > > fail with -ERESTART.  But the comment for ERESTART makes it explicit
> > > that it should be internal only -- it should cause a hypercall
> > > continuation (so that the hypercall restarts automatically), rather
> > > than returning to the guest.
> > >
> > > But the hypercall continuation code seems to have disappeared from
> > > do_hvm_op() at some point?
> > >
> > > /me digs a bit more...
> >
> > The problem turns out to be commit ae20ccf ("dm_op: convert
> > HVMOP_set_mem_type"), which says:
> >
> >     This patch removes the need for handling HVMOP restarts, so that
> >     infrastructure is removed.
> >
> > While it's true that there are no more operations which need iteration
> > information restored, but there are two operations which may still
> > need to be restarted to avoid deadlocks with other operations.
> >
> > Attached is a patch which restores hypercall continuation checking.
> > Xudong, can you give it a test?
> >
> > Thanks,
> >  -George
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.