[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI

To: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
From: Diana Crisan <dcrisan@xxxxxxxxxxxx>
Date: Fri, 31 May 2013 09:34:23 +0100
Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Alex Bligh <alex@xxxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>
Delivery-date: Fri, 31 May 2013 08:35:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

George,
On 30/05/13 17:06, George Dunlap wrote:

On 05/30/2013 04:55 PM, Diana Crisan wrote:

On 30/05/13 16:26, George Dunlap wrote:

On Tue, May 28, 2013 at 4:06 PM, Diana Crisan <dcrisan@xxxxxxxxxxxx>
wrote:

Hi,


On 26/05/13 09:38, Ian Campbell wrote:

On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote:

George,

--On 24 May 2013 17:16:07 +0100 George Dunlap
<George.Dunlap@xxxxxxxxxxxxx>
wrote:

FWIW it's reproducible on every host h/w platform we've tried
(a total of 2).

Do you see the same effects if you do a local-host migrate?

I hadn't even realised that was possible. That would have madetesting

live
migrate easier!

That's basically the whole reason it is supported ;-)

How do you avoid the name clash in xen-store?

Most toolstacks receive the incoming migration into a domain named

FOO-incoming or some such and then rename to FOO upon completion.Some

also rename the outgoing domain "FOO-migratedaway" towards the end so
that the bits of the final teardown which can safely happen after the
target have start can be done so.

Ian.

I am unsure what I am doing wrong, but I cannot seem to be able todo a

localhost migrate.

I created a domU using "xl create xl.conf" and once it fully booted I
issued
an "xl migrate 11 localhost". This fails and gives the output below.

Would you please advise on how to get this working?

Thanks,
Diana


root@ubuntu:~# xl migrate 11 localhost
root@localhost's password:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/2344)
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/2344)
  Savefile contains xl domain config
xc: progress: Reloading memory pages: 53248/1048575    5%
xc: progress: Reloading memory pages: 105472/1048575   10%
libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12
device
model: spawn failed (rc=-3)
libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device
model
did not start: -3

libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: DeviceModel

already exited
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
truncated
reading ready message from migration receiver stream
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration
target process [10934] exited with error status 3
Migration failed, resuming at sender.
xc: error: Cannot resume uncooperative HVM guests: Internal error
libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume
failed for
domain 11: Success

Aha -- I managed to reproduce this one as well.

Your problem is the "vncunused=0" -- that's instructing qemu "You must
use this exact port for the vnc server".  But when you do the migrate,
that port is still in use by the "from" domain; so the qemu for the
"to" domain can't get it, and fails.

Obviously this should fail a lot more gracefully, but that's a bit of
a lower-priority bug I think.

  -George

Yes, I managed to get to the bottom of it too and got vms migrating on
localhost on our end.

I can confirm I did get the clock stuck problem while doing a localhost
migrate.

Does the script I posted earlier "work" for you (i.e., does it failafter some number of migrations)?

I left your script running throughout the night and it seems that itdoes not always catch the problem. I see the following:


1. vm has the clock stuck
2. script is still running as it seems the vm is still ping-able.

3. migration fails on the basis that the vm is does not ack the suspendrequest (see below).

libxl: error: libxl_dom.c:1063:libxl__domain_suspend_common_callback:guest didn't acknowledge suspend, cancelling requestlibxl: error: libxl_dom.c:1085:libxl__domain_suspend_common_callback:guest didn't acknowledge suspend, request cancelled

xc: error: Suspend request failed: Internal error
xc: error: Domain appears not to have suspended: Internal error

libxl: error: libxl_dom.c:1370:libxl__xc_domain_save_done: savingdomain: domain did not respond to suspend request: Invalid argument

migration sender: libxl_domain_suspend failed (rc=-8)
xc: error: 0-length read: Internal error
xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error
xc: error: Error when reading batch size (0 = Success): Internal error
xc: error: Error when reading batch (0 = Success): Internal error

libxl: error: libxl_create.c:834:libxl__xc_domain_restore_done:restoring domain: Resource temporarily unavailablelibxl: error: libxl_create.c:916:domcreate_rebuild_done: cannot(re-)build domain: -3

libxl: error: libxl.c:1378:libxl__destroy_domid: non-existant domain 111

libxl: error: libxl.c:1342:domain_destroy_callback: unable to destroyguest with domid 111libxl: error: libxl_create.c:1225:domcreate_destruction_cb: unable todestroy domain 111 following failed creation

migration target: Domain creation failed (code -3).

libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migrationtarget process [7849] exited with error status 3

Migration failed, failed to suspend at sender.
PING 172.16.1.223 (172.16.1.223) 56(84) bytes of data.
64 bytes from 172.16.1.223: icmp_req=1 ttl=64 time=0.339 ms
64 bytes from 172.16.1.223: icmp_req=2 ttl=64 time=0.569 ms
64 bytes from 172.16.1.223: icmp_req=3 ttl=64 time=0.535 ms
64 bytes from 172.16.1.223: icmp_req=4 ttl=64 time=0.544 ms
64 bytes from 172.16.1.223: icmp_req=5 ttl=64 time=0.529 ms

I've been using it to do a localhost migrate, using a nearly identicalconfig as the one you posted (only difference, I'm using blkbackrather than blktap), with an Ubuntu Precise VM using the3.2.0-39-virtual kernel, and I'm up to 20 migrates with no problems.
Differences between my setup and yours at this point:
 - probably hardware (I've got an old AMD box)
 - dom0 kernel is Debian 2.6.32-5-xen
 - not using blktap
I've also been testing this on an Intel box, with the Debian3.2.0-4-686-pae kernel, with a Debian distro, and it's up to 103successful migrates.
It's possible that it's a model-specific issue, but it's sort of hardto see how the dom0 kernel, or blktap, could cause this.
Do you have any special kernel config parameters you're passing in tothe guest?
Also, could you try a generic Debian Wheezy install, just to see ifit's got something to do with the kernel?
 -George

I reckon our code caught a separate problem with this issue as wheneverthe vm got its clock stuck, the network interface wasn't coming back upand I would see NO-CARRIER for the guest, which made it unreachable.


--
Diana

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: George Dunlap

References:
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Diana Crisan
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: David Vrabel
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Diana Crisan
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: David Vrabel
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Alex Bligh
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: George Dunlap
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Alex Bligh
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Ian Campbell
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Diana Crisan
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: George Dunlap
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: Diana Crisan
- Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
  - From: George Dunlap

Prev by Date: Re: [Xen-devel] [PATCH v2] tools/xen-mceinj: support AMD
Next by Date: Re: [Xen-devel] [PATCH v2] tools/xen-mceinj: support AMD
Previous by thread: Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
Next by thread: Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.