|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Live migration hanging for a CPU intensive task
Anthony,migration doesn't go on forever. There's a limit on the amount of memory you can transmit and the number of iterations you can do. Hitting one of those limits means that the last non-live iteration will be a bit longer than expected. However, the test stalls right at the beginning of that last iteration, before being able to move a single page, because the suspend hangs. I'm trying to get to the suspend operation. I see that xc_linux_save.c prints "suspend" to the python wrapper XendChekpoint.py, which in turn places a transaction on xenstore. Am I right so far? what happens next? Andres If this is the case, wouldn't you expect that migration could never complete? If the VM is dirty more pages than it transmit, the algorithm will not converge. Is there something else going on here?Regards, Anthony LiguoriThis is due to the heavy processing going on and the restricted network conditions (100Mbps, 33 ms RTT). If I run the same test in a Gigabit LAN, migration finishes succesfully. After suspend begins, the VM becomes unresponsive, as expected. However, xm top indicates a steady CPU utilization of 99%I would appreciate if someone with insight on the migration/suspend code can give me a hint. In principle, I need to add more debugging messages to the suspend code. I'm thinking that given the length of the migration and the rate of page dirtying, the shadow page tables get "overloaded" (whatever that might be) and that causes weirdness while suspending. But that's just a wild conjecture.Thanks indeed for any feedback Andres xend.log -----------------------------[2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:177) XendDomainInfo.create(['vm', ['name', 'xenbox'], ['memory', '512'], ['vcpus', '4'], ['image', ['linux', ['kernel', '/boot/vmlinuz-2.6.12.6-xenU'], ['ip', ':1.2.3.4::::eth0:dhcp'], ['root', '/dev/sda1 ro'], ['args', '4']]], ['device', ['vbd', ['uname', 'phy:hdk'], ['dev', 'sda1'], ['mode', 'w']]], ['device', ['vif', ['mac', '00:a0:24:60:31:67']]]]) [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:282) parseConfig: config is ['vm', ['name', 'xenbox'], ['memory', '512'], ['vcpus', '4'], ['image', ['linux', ['kernel', '/boot/vmlinuz-2.6.12.6-xenU'], ['ip', ':1.2.3.4::::eth0:dhcp'], ['root', '/dev/sda1 ro'], ['args', '4']]], ['device', ['vbd', ['uname', 'phy:hdk'], ['dev', 'sda1'], ['mode', 'w']]], ['device', ['vif', ['mac', '00:a0:24:60:31:67']]]] [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:376) parseConfig: result is {'ssidref': None, 'uuid': None, 'on_crash': None, 'on_reboot': None, 'image': ['linux', ['kernel', '/boot/vmlinuz-2.6.12.6-xenU'], ['ip', ':1.2.3.4::::eth0:dhcp'], ['root', '/dev/sda1 ro'], ['args', '4']], 'on_poweroff': None, 'cpus': None, 'name': 'xenbox', 'backend': [], 'vcpus': 4, 'cpu_weight': None, 'vcpu_avail': None, 'memory': 512, 'device': [('vbd', ['vbd', ['uname', 'phy:hdk'], ['dev', 'sda1'], ['mode', 'w']]), ('vif', ['vif', ['mac', '00:a0:24:60:31:67']])], 'bootloader': None, 'cpu': None, 'maxmem': None} [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:1130) XendDomainInfo.construct: None 0 [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:1162) XendDomainInfo.initDomain: 31 1.0 [2006-04-12 15:55:08 xend] DEBUG (balloon:84) Balloon: free 1025; need 513; done. [2006-04-12 15:55:08 xend] INFO (image:133) buildDomain os=linux dom=31 vcpus=4[2006-04-12 15:55:08 xend] DEBUG (image:171) dom = 31[2006-04-12 15:55:08 xend] DEBUG (image:172) image = /boot/vmlinuz-2.6.12.6-xenU[2006-04-12 15:55:08 xend] DEBUG (image:173) store_evtchn = 1 [2006-04-12 15:55:08 xend] DEBUG (image:174) console_evtchn = 2[2006-04-12 15:55:08 xend] DEBUG (image:175) cmdline = ip=:1.2.3.4::::eth0:dhcp root=/dev/sda1 ro 4 [2006-04-12 15:55:08 xend] DEBUG (image:176) ramdisk = [2006-04-12 15:55:08 xend] DEBUG (image:177) vcpus = 4 [2006-04-12 15:55:08 xend] DEBUG (DevController:103) DevController: writing {'virtual-device': '2049', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/31/2049'} to /local/domain/31/device/vbd/2049. [2006-04-12 15:55:08 xend] DEBUG (DevController:105) DevController: writing {'domain': 'xenbox', 'frontend': '/local/domain/31/device/vbd/2049', 'dev': 'sda1', 'state': '1', 'params': 'hdk', 'mode': 'w', 'frontend-id': '31', 'type': 'phy'} to /local/domain/0/backend/vbd/31/2049. [2006-04-12 15:55:08 xend] DEBUG (DevController:103) DevController: writing {'backend-id': '0', 'mac': '00:a0:24:60:31:67', 'handle': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/31/0'} to /local/domain/31/device/vif/0. [2006-04-12 15:55:08 xend] DEBUG (DevController:105) DevController: writing {'mac': '00:a0:24:60:31:67', 'state': '1', 'handle': '0', 'script': '/etc/xen/scripts/vif-bridge', 'frontend-id': '31', 'domain': 'xenbox', 'frontend': '/local/domain/31/device/vif/0'} to /local/domain/0/backend/vif/31/0. [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:663) Storing VM details: {'ssidref': '0', 'uuid': '78c48e6b-4d9c-5478-15f2-5aa295db9cde', 'on_reboot': 'restart', 'start_time': '1144871708.44', 'on_poweroff': 'destroy', 'name': 'xenbox', 'vcpus': '4', 'vcpu_avail': '15', 'memory': '512', 'on_crash': 'restart', 'image': "(linux (kernel /boot/vmlinuz-2.6.12.6-xenU) (ip :1.2.3.4::::eth0:dhcp) (root '/dev/sda1 ro') (args 4))", 'maxmem': '512'} [2006-04-12 15:55:08 xend.XendDomainInfo] DEBUG (XendDomainInfo:688) Storing domain details: {'console/ring-ref': '50152', 'console/port': '2', 'cpu/3/availability': 'online', 'name': 'xenbox', 'console/limit': '1048576', 'cpu/2/availability': 'online', 'vm': '/vm/78c48e6b-4d9c-5478-15f2-5aa295db9cde', 'domid': '31', 'cpu/0/availability': 'online', 'memory/target': '524288', 'store/ring-ref': '443869', 'cpu/1/availability': 'online', 'store/port': '1'} [2006-04-12 15:55:08 xend] DEBUG (DevController:132) Waiting for devices vif.[2006-04-12 15:55:08 xend] DEBUG (DevController:138) Waiting for 0.[2006-04-12 15:55:08 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vif/31/0/hotplug-status. [2006-04-12 15:55:08 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vif/31/0/hotplug-status. [2006-04-12 15:55:08 xend] DEBUG (DevController:417) hotplugStatusCallback 1. [2006-04-12 15:55:08 xend] DEBUG (DevController:132) Waiting for devices usb. [2006-04-12 15:55:08 xend] DEBUG (DevController:132) Waiting for devices vbd.[2006-04-12 15:55:08 xend] DEBUG (DevController:138) Waiting for 2049.[2006-04-12 15:55:08 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vbd/31/2049/hotplug-status. [2006-04-12 15:55:09 xend] DEBUG (DevController:403) hotplugStatusCallback /local/domain/0/backend/vbd/31/2049/hotplug-status. [2006-04-12 15:55:09 xend] DEBUG (DevController:417) hotplugStatusCallback 1. [2006-04-12 15:55:09 xend] DEBUG (DevController:132) Waiting for devices pci. [2006-04-12 15:55:09 xend] DEBUG (DevController:132) Waiting for devices ioports. [2006-04-12 15:55:09 xend] DEBUG (DevController:132) Waiting for devices vtpm. [2006-04-12 15:55:09 xend] INFO (XendDomain:360) Domain xenbox (31) unpaused. [2006-04-12 15:56:09 xend] DEBUG (XendCheckpoint:80) [xc_save]: /usr/lib/xen/bin/xc_save 11 17 31 0 0 1 [2006-04-12 15:56:09 xend] ERROR (XendCheckpoint:227) Had 0 unexplained entries in p2m table [2006-04-12 15:57:10 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 1 0% 6% 11% 16% 21%PT Race: [10000000,904] pte=bdc063, mfn=00000bdc [2006-04-12 15:57:10 xend] ERROR (XendCheckpoint:227) PT Race: [10000000,905] pte=bdd063, mfn=00000bdd [2006-04-12 15:57:10 xend] ERROR (XendCheckpoint:227) PT Race: [10000000,906] pte=bde063, mfn=00000bde [2006-04-12 15:57:10 xend] ERROR (XendCheckpoint:227) PT Race: [10000000,907] pte=bdf063, mfn=00000bdf [2006-04-12 15:57:10 xend] ERROR (XendCheckpoint:227) PT Race: [10000000,908] pte=bd9063, mfn=00000bd9 [2006-04-12 15:59:44 xend] ERROR (XendCheckpoint:227) 26% 31% 36% 41% 46% 51% 56% 61% 66% 71% 76% 81% 86% 92% 97% 1: sent 98834, skipped 32238, delta 215040ms, dom0 6%, target 22%, sent 15Mb/s, dirtied 6Mb/s 39757 pages [2006-04-12 16:00:17 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 2 0% 5% 13% 18% 23% 28% 33% 43% 48% 75% 2: sent 15081, skipped 24676, delta 32749ms, dom0 6%, target 23%, sent 15Mb/s, dirtied 37Mb/s 37093 pages [2006-04-12 16:00:17 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 3 0%netbuf race: iter 3, pfn 401. mfn ffffffff [2006-04-12 16:00:19 xend] ERROR (XendCheckpoint:227) netbuf race: iter 3, pfn 5760. mfn ffffffff [2006-04-12 16:00:49 xend] ERROR (XendCheckpoint:227) 5% 17% 26% 31% 36% 41% 48% 54% 78% 3: sent 14415, skipped 22676, delta 31370ms, dom0 6%, target 23%, sent 15Mb/s, dirtied 38Mb/s 37102 pages [2006-04-12 16:00:49 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 4 0%netbuf race: iter 4, pfn 32b8. mfn ffffffff [2006-04-12 16:01:24 xend] ERROR (XendCheckpoint:227) 5% 16% 24% 33% 38% 44% 49% 54% 63% 96% 4: sent 16474, skipped 20627, delta 35812ms, dom0 6%, target 23%, sent 15Mb/s, dirtied 31Mb/s 34655 pages [2006-04-12 16:01:24 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 5 0%netbuf race: iter 5, pfn 36a0. mfn ffffffff [2006-04-12 16:01:27 xend] ERROR (XendCheckpoint:227) netbuf race: iter 5, pfn 5760. mfn ffffffff [2006-04-12 16:01:40 xend] ERROR (XendCheckpoint:227) 6% 16% 21%netbuf race: iter 5, pfn 3b52. mfn ffffffff [2006-04-12 16:01:44 xend] ERROR (XendCheckpoint:227) 29%netbuf race: iter 5, pfn 327f. mfn ffffffff [2006-04-12 16:01:51 xend] ERROR (XendCheckpoint:227) 34%netbuf race: iter 5, pfn 34cd. mfn ffffffff [2006-04-12 16:02:00 xend] ERROR (XendCheckpoint:227) 40% 46%netbuf race: iter 5, pfn 390a. mfn ffffffff [2006-04-12 16:02:33 xend] ERROR (XendCheckpoint:227) 52% 57% 62% 67% 75% 81% 87% 92%netbuf race: iter 5, pfn 2df0. mfn ffffffff [2006-04-12 16:02:33 xend] ERROR (XendCheckpoint:227) netbuf race: iter 5, pfn 2df1. mfn ffffffff[2006-04-12 16:02:38 xend] ERROR (XendCheckpoint:227) 99%5: sent 33877, skipped 770, delta 73629ms, dom0 5%, target 31%, sent 15Mb/s, dirtied 0Mb/s 816 pages [2006-04-12 16:02:40 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 6 0% 6: sent 816, skipped 0, delta 1783ms, dom0 6%, target 31%, sent 14Mb/s, dirtied 1Mb/s 94 pages [2006-04-12 16:02:40 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 7 0% 7: sent 94, skipped 0, delta 169ms, dom0 7%, target 28%, sent 18Mb/s, dirtied 11Mb/s 59 pages [2006-04-12 16:02:40 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 8 0% 8: sent 59, skipped 0, delta 128ms, dom0 7%, target 28%, sent 15Mb/s, dirtied 6Mb/s 25 pages [2006-04-12 16:02:40 xend] ERROR (XendCheckpoint:227) Saving memory pages: iter 9 0%9: sent 25, skipped 0, Start last iteration [2006-04-12 16:02:40 xend] DEBUG (XendCheckpoint:200) suspend[2006-04-12 16:02:40 xend] DEBUG (XendCheckpoint:83) In saveInputHandler suspend[2006-04-12 16:02:40 xend] DEBUG (XendCheckpoint:85) Suspending 31 ... --> After 7 minutes I kill the thing <---[2006-04-12 16:09:36 xend.XendDomainInfo] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroy: domid=31 [2006-04-12 16:09:36 xend.XendDomainInfo] DEBUG (XendDomainInfo:1280) XendDomainInfo.destroyDomain(31)[2006-04-12 16:09:36 xend] INFO (XendCheckpoint:88) Domain 31 suspended.[2006-04-12 16:09:36 xend] ERROR (XendCheckpoint:99) Save failed on domain xenbox (31).Traceback (most recent call last): File "/usr/lib/python/xen/xend/XendCheckpoint.py", line 93, in save forkHelper(cmd, fd, saveInputHandler, False)File "/usr/lib/python/xen/xend/XendCheckpoint.py", line 206, in forkHelper _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |