[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: Live migration failed


  • To: xen-users@xxxxxxxxxxxxxxxxxxx
  • From: Irwan Hadi <iblist18@xxxxxxxxx>
  • Date: Tue, 22 Dec 2009 19:14:52 -0700
  • Delivery-date: Tue, 22 Dec 2009 18:15:40 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=v3yV6LTeBn6G22X/akf5tSouOYJwdc2zSgibokXtVOMNpSncnz/MB04nvgS6F00r7K iSbxnfc7LmmHO1HoaCHqM+gBJ0Hg5cWD/ePXEumOJt59RWfoaohIXNcZmjEh6PrH1f6H 5wv0Fm2c3my85SJKgLvr8ycSmiHTeHKJuivxY=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

Actually after further research, it looks like this maybe a known
issue that affect when the originating dom0 has bigger memory than the
receiving dom0
In my case, the vmhost1 dom0 has much bigger memory, and I'm in the
process of standardizing the dom0 memory that we have in the grub.conf
when I hit this bug.
I suppose until this bug is fix, I will have to do xm save/restore so
that the domu won't crashed...

Something weird I found though is that after domu crashed and
rebooted, the live migration of it then will work fine.
Does anyone else ever have the same issue?

https://bugzilla.redhat.com/show_bug.cgi?id=511135


On Tue, Dec 22, 2009 at 6:58 PM, Irwan Hadi <iblist18@xxxxxxxxx> wrote:
> I'm trying to do live migration between two xen hosts. Both are
> running Centos 5.4 , and both are running Xen 3.4.0 from gitco.
> The backend storage is NFS served thru NetworkAppliance filer.
>
> The issue is sometimes the live migration failed , and the domain
> being migrated will crashed and rebooted.. The error that I got is as
> follow:
> Does anyone know what causing it?
>
>
> ============================================================
> # xm migrate --live domu1  vmhost2
> Error: /usr/lib64/xen/bin/xc_save 16 72 0 0 1 failed
> Usage: xm migrate <Domain> <Host>
>
> Migrate a domain to another machine.
>
> Options:
>
> -h, --help           Print this help.
> -l, --live           Use live migration.
> -p=portnum, --port=portnum
>                     Use specified port for migration.
> -n=nodenum, --node=nodenum
>                     Use specified NUMA node on target.
> -s, --ssl            Use ssl connection for migration.
>
> #
>
> ============================================================
> at vmhost1 (the originating VM host)
> [2009-12-22 18:41:39 6417] DEBUG (balloon:172) Balloon: 4936 KiB free;
> 0 to scrub; need 18432; retries: 20.
> [2009-12-22 18:41:39 6417] DEBUG (balloon:187) Balloon: setting dom0
> target to 8671 MiB.
> [2009-12-22 18:41:39 6417] DEBUG (XendDomainInfo:1302) Setting memory
> target of domain Domain-0 (0) to 8671 MiB.
> [2009-12-22 18:41:40 6417] DEBUG (balloon:166) Balloon: 19400 KiB
> free; need 18432; done.
> [2009-12-22 18:41:40 6417] DEBUG (XendCheckpoint:110) [xc_save]:
> /usr/lib64/xen/bin/xc_save 16 72 0 0 1
> [2009-12-22 18:41:40 6417] INFO (XendCheckpoint:417) xc_save: failed
> to get the suspend evtchn port
> [2009-12-22 18:41:40 6417] INFO (XendCheckpoint:417)
> [2009-12-22 18:41:40 6417] INFO (XendCheckpoint:417) Had 0 unexplained
> entries in p2m table
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) Saving memory
> pages: iter 1  95%^M 1: sent 510088, skipped 3959, delta 15263ms, dom0
> 5
> 8%, target 0%, sent 1095Mb/s, dirtied 13Mb/s 6431 pages
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) Saving memory
> pages: iter 2  98%^M 2: sent 6370, skipped 26, delta 197ms, dom0 48%,
> ta
> rget 0%, sent 1059Mb/s, dirtied 11Mb/s 71 pages
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) Saving memory
> pages: iter 3   0%^M 3: sent 71, skipped 0, delta 7ms, dom0 100%,
> target
>  0%, sent 332Mb/s, dirtied 0Mb/s 0 pages
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) Saving memory
> pages: iter 4   0%^M 4: sent 0, skipped 0, Start last iteration
> [2009-12-22 18:41:55 6417] DEBUG (XendCheckpoint:388) suspend
> [2009-12-22 18:41:55 6417] DEBUG (XendCheckpoint:113) In
> saveInputHandler suspend
> [2009-12-22 18:41:55 6417] DEBUG (XendCheckpoint:115) Suspending 72 ...
> [2009-12-22 18:41:55 6417] DEBUG (XendDomainInfo:511)
> XendDomainInfo.shutdown(suspend)
> [2009-12-22 18:41:55 6417] DEBUG (XendDomainInfo:1708)
> XendDomainInfo.handleShutdownWatch
> [2009-12-22 18:41:55 6417] DEBUG (XendDomainInfo:1708)
> XendDomainInfo.handleShutdownWatch
> [2009-12-22 18:41:55 6417] WARNING (XendDomainInfo:1877) Domain has
> crashed: name=migrating-domu1 id=72.
> [2009-12-22 18:41:55 6417] DEBUG (XendDomainInfo:2723)
> XendDomainInfo.destroy: domid=72
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:121) Domain 72 suspended.
> [2009-12-22 18:41:55 6417] DEBUG (XendCheckpoint:130) Written done
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) ERROR Internal
> error: Domain not in suspended state
> [2009-12-22 18:41:55 6417] INFO (XendCheckpoint:417) ERROR Internal
> error: Domain appears not to have suspended
> [2009-12-22 18:41:56 6417] INFO (XendCheckpoint:417) Save exit rc=1
> [2009-12-22 18:41:56 6417] ERROR (XendCheckpoint:164) Save failed on
> domain domu1 (72) - resuming.
> Traceback (most recent call last):
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
> line 132, in save
>    forkHelper(cmd, fd, saveInputHandler, False)
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
> line 405, in forkHelper
>    raise XendError("%s failed" % string.join(cmd))
> XendError: /usr/lib64/xen/bin/xc_save 16 72 0 0 1 failed
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2779)
> XendDomainInfo.resumeDomain(72)
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2198) Destroying device model
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2205) Releasing devices
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2218) Removing vif/0
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:1133)
> XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2218) Removing vbd/51712
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:1133)
> XendDomainInfo.destroyDevice: deviceClass = tap, device = vbd/51712
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2218) Removing console/0
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:1133)
> XendDomainInfo.destroyDevice: deviceClass = console, device =
> console/0
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2203) No device model
> [2009-12-22 18:41:56 6417] DEBUG (XendDomainInfo:2205) Releasing devices
>
>
> ============================================================
> at vmhost2 (the receiving VM host)
>
> [2009-12-22 18:41:40 6582] DEBUG (XendDomainInfo:2295)
> XendDomainInfo.constructDomain
> [2009-12-22 18:41:40 6582] DEBUG (balloon:166) Balloon: 97457736 KiB
> free; need 4096; done.
> [2009-12-22 18:41:40 6582] DEBUG (XendDomain:452) Adding Domain: 1
> [2009-12-22 18:41:40 6582] DEBUG (XendDomainInfo:3051) Storing VM
> details: {'on_xend_stop': 'ignore', 'shadow_memory': '0', 'uuid':
> '8ca7c0
> ac-e8ce-eaf5-fe38-70374c6cc1af', 'on_reboot': 'restart', 'start_time':
> '1261015586.75', 'on_poweroff': 'destroy', 'bootloader_args': '', 'o
> n_xend_start': 'ignore', 'on_crash': 'restart', 'xend/restart_count':
> '0', 'vcpus': '2', 'vcpu_avail': '3', 'bootloader': '/usr/bin/pygrub'
> , 'image': "(linux (kernel ) (notes (FEATURES
> 'writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|
> supervisor_mode_kernel') (VIRT_BASE 18446744071562067968)
> (GUEST_VERSION 2.6) (PADDR_OFFSET 18446744071562067968) (GUEST_OS
> linux) (HYPERCA
> LL_PAGE 18446744071564189696) (LOADER generic) (ENTRY
> 18446744071564165120) (XEN_VERSION xen-3.0)))", 'name': 'domu1'}
> [2009-12-22 18:41:40 6582] INFO (XendDomainInfo:2159) createDevice:
> console : {'protocol': 'vt100', 'location': '2', 'uuid':
> '9e2b597a-e688
> -787e-bacf-50a04d624fc1'}
> [2009-12-22 18:41:40 6582] DEBUG (DevController:95) DevController:
> writing {'state': '1', 'backend-id': '0', 'backend':
> '/local/domain/0/ba
> ckend/console/1/0'} to /local/domain/1/device/console/0.
> [2009-12-22 18:41:40 6582] DEBUG (DevController:97) DevController:
> writing {'domain': 'domu1', 'frontend': '/local/domain/1/device/co
> nsole/0', 'uuid': '9e2b597a-e688-787e-bacf-50a04d624fc1',
> 'frontend-id': '1', 'state': '1', 'location': '2', 'online': '1',
> 'protocol': 'vt
> 100'} to /local/domain/0/backend/console/1/0.
> [2009-12-22 18:41:40 6582] INFO (XendDomainInfo:2159) createDevice:
> tap : {'protocol': 'x86_64-abi', 'uuid':
> '42d54a8b-b537-6cac-966f-8c437
> a6717c4', 'bootable': '1', 'dev': 'xvda:disk', 'uname':
> 'tap:aio:/nfsvol1/domu1', 'mode': 'w', 'backend': '0'}
> [2009-12-22 18:41:40 6582] DEBUG (DevController:95) DevController:
> writing {'virtual-device': '51712', 'protocol': 'x86_64-abi',
> 'device-ty
> pe': 'disk', 'backend-id': '0', 'state': '1', 'backend':
> '/local/domain/0/backend/tap/1/51712'} to
> /local/domain/1/device/vbd/51712.
> [2009-12-22 18:41:40 6582] DEBUG (DevController:97) DevController:
> writing {'domain': 'domu1', 'frontend': '/local/domain/1/device/vb
> d/51712', 'uuid': '42d54a8b-b537-6cac-966f-8c437a6717c4', 'bootable':
> '1', 'dev': 'xvda', 'state': '1', 'params': 'aio:/nfsvol1/domu1',
> 'mode': 'w', 'online': '1', 'frontend-id': '1', 'type': 'tap'} to
> /local/domain/0/backend/tap/1/51712.
> [2009-12-22 18:41:40 6582] INFO (XendDomainInfo:2159) createDevice:
> vif : {'bridge': 'xenbr212', 'mac': '00:16:3e:04:0d:39', 'script':
> '/et
> c/xen/scripts/vif-bridge', 'uuid':
> '747b1a8a-3e77-682b-d70a-b8673581b6c8', 'backend': '0'}
> [2009-12-22 18:41:40 6582] DEBUG (DevController:95) DevController:
> writing {'backend-id': '0', 'mac': '00:16:3e:04:0d:39', 'handle': '0',
> '
> state': '1', 'backend': '/local/domain/0/backend/vif/1/0'} to
> /local/domain/1/device/vif/0.
> [2009-12-22 18:41:40 6582] DEBUG (DevController:97) DevController:
> writing {'bridge': 'xenbr212', 'domain': 'domu1', 'handle': '0', '
> uuid': '747b1a8a-3e77-682b-d70a-b8673581b6c8', 'script':
> '/etc/xen/scripts/vif-bridge', 'mac': '00:16:3e:04:0d:39',
> 'frontend-id': '1', 'st
> ate': '1', 'online': '1', 'frontend': '/local/domain/1/device/vif/0'}
> to /local/domain/0/backend/vif/1/0.
> [2009-12-22 18:41:40 6582] DEBUG (DevController:95) DevController:
> writing {'backend-id': '0', 'mac': '00:16:3e:04:0d:39', 'handle': '0',
> '
> state': '1', 'backend': '/local/domain/0/backend/vif/1/0'} to
> /local/domain/1/device/vif/0.
> [2009-12-22 18:41:40 6582] DEBUG (DevController:97) DevController:
> writing {'bridge': 'xenbr212', 'domain': 'domu1', 'handle': '0', '
> uuid': '747b1a8a-3e77-682b-d70a-b8673581b6c8', 'script':
> '/etc/xen/scripts/vif-bridge', 'mac': '00:16:3e:04:0d:39',
> 'frontend-id': '1', 'st
> ate': '1', 'online': '1', 'frontend': '/local/domain/1/device/vif/0'}
> to /local/domain/0/backend/vif/1/0.
> [2009-12-22 18:41:40 6582] DEBUG (XendDomainInfo:1621) Storing domain
> details: {'image/entry': '18446744071564165120', 'console/port': '2',
>  'image/loader': 'generic', 'vm':
> '/vm/8ca7c0ac-e8ce-eaf5-fe38-70374c6cc1af',
> 'control/platform-feature-multiprocessor-suspend': '1', 'imag
> e/guest-os': 'linux', 'cpu/1/availability': 'online',
> 'image/features/writable-descriptor-tables': '1', 'image/virt-base':
> '184467440715620
> 67968', 'memory/target': '2097152', 'image/guest-version': '2.6',
> 'image/features/supervisor-mode-kernel': '1', 'console/limit':
> '1048576',
>  'image/paddr-offset': '18446744071562067968', 'image/hypercall-page':
> '18446744071564189696', 'cpu/0/availability': 'online', 'image/featu
> res/pae-pgdir-above-4gb': '1', 'image/features/writable-page-tables':
> '1', 'console/type': 'xenconsoled', 'image/features/auto-translated-p
> hysmap': '1', 'name': 'domu1', 'domid': '1', 'image/xen-version':
> 'xen-3.0', 'store/port': '1'}
> [2009-12-22 18:41:40 6582] DEBUG (XendCheckpoint:261)
> restore:shadow=0x0, _static_max=0x100000000, _static_min=0x0,
> [2009-12-22 18:41:40 6582] DEBUG (balloon:166) Balloon: 97457616 KiB
> free; need 2097152; done.
> [2009-12-22 18:41:40 6582] DEBUG (XendCheckpoint:278) [xc_restore]:
> /usr/lib64/xen/bin/xc_restore 16 1 1 2 0 0 0
> [2009-12-22 18:41:40 6582] INFO (XendCheckpoint:417) xc_domain_restore
> start: p2m_size = 7d800
> [2009-12-22 18:41:40 6582] INFO (XendCheckpoint:417) Reloading memory
> pages:   0%
> [2009-12-22 18:41:56 6582] INFO (XendCheckpoint:417) ERROR Internal
> error: Error when reading batch size
> [2009-12-22 18:41:57 6582] INFO (XendCheckpoint:417) Restore exit with rc=1
> [2009-12-22 18:41:57 6582] DEBUG (XendDomainInfo:2723)
> XendDomainInfo.destroy: domid=1
> [2009-12-22 18:41:57 6582] ERROR (XendDomainInfo:2737)
> XendDomainInfo.destroy: domain destruction failed.
> Traceback (most recent call last):
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
> line 2730, in destroy
>    xc.domain_pause(self.domid)
> Error: (3, 'No such process')
> [2009-12-22 18:41:57 6582] DEBUG (XendDomainInfo:2203) No device model
> [2009-12-22 18:41:57 6582] DEBUG (XendDomainInfo:2205) Releasing devices
> [2009-12-22 18:41:57 6582] DEBUG (XendDomainInfo:1133)
> XendDomainInfo.destroyDevice: deviceClass = console, device =
> console/0
> [2009-12-22 18:41:57 6582] ERROR (XendDomain:1149) Restore failed
> Traceback (most recent call last):
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py",
> line 1147, in domain_restore_fd
>    return XendCheckpoint.restore(self, fd, paused=paused,
> relocating=relocating)
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
> line 282, in restore
>    forkHelper(cmd, fd, handler.handler, True)
>  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
> line 405, in forkHelper
>    raise XendError("%s failed" % string.join(cmd))
> XendError: /usr/lib64/xen/bin/xc_restore 16 1 1 2 0 0 0 failed
> '], ['VDI']]]])
>
> ============================================================
>
>
> Thanks
>

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.