[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Migration issues with 4.1.



On Sat, 2013-02-02 at 08:48 +0000, Dr. Greg Wettstein wrote:
> Good morning, hope everyone's day is going well.
> 
> We have sorted out most of the issues with a new iSCSI hotplug script
> which allows Xen guests to be treated as first class SAN guests.  The
> script allows each virtual machine to be treated as an independent
> initiator (IQN) which enables guests to be managed through LUN masking
> initiator groups on popular target platforms such as SCST.
> 
> When we began testing live migration on Xen 4.2.1 we noted problems
> with PVOPS kernels not starting on the migration target.  The
> following is output when a migration is attempted:
> 
> ---------------------------------------------------------------------------
> migration target: Ready to receive domain.
> Saving to migration stream new xl format (info 0x0/0x0/8146)
> Loading new save file <incoming migration stream> (new xl fmt info 
> 0x0/0x0/8146) Savefile contains xl domain config
> xc: Saving memory: iter 0 (last sent 0 skipped 0): 65536/65536  100%
> xc: Saving memory: iter 1 (last sent 65457 skipped 79): 65536/65536  100%
> xc: Saving memory: iter 2 (last sent 86 skipped 0): 65536/65536  100%
> xc: Saving memory: iter 3 (last sent 16 skipped 0): 65536/65536  100%
> migration receiver stream contained unexpected data instead of ready message

Can you patch (lib)xl to print this unexpected data?

> (command run was: exec ./xen-migrate rainbow xl migrate-receive )

What happens if you run this by hand? I wonder if it is asking for a
password or printing a message about accepting host keys or something
and confusing the xl migration protocol.

> migration target: Transfer complete, requesting permission to start domain.
> libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated 
> reading GO message from migration stream
> migration target: Failure, destroying our copy.
> migration child [8355] not exiting, no longer waiting (exit status will be 
> unreported)
> Migration failed, resuming at sender.

There is a known issue with libxl_domain_resume not properly
implementing the slow mode correctly, it might help to changing the
libxl_domain_resume call after the "Migration failed, resuming at
sender.\n" from
        libxl_domain_resume(ctx, domid, 0, 0);
to
        libxl_domain_resume(ctx, domid, 1, 0);

Note that this won't fix you migration, but I hope it will fix the
failure path at least.

If someone is interested in fixing libxl_domain_resume(..., 0, 0) then
what's needed is to add the stuff from
tools/python/xen/xend/XendDomainInfo.py resumeDomain to libxl. I'd be
happy to advise.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.