[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Live migration bug introduced in 2.6.32.16?
Hi All, It looks like a live migration bug may have been introduced in 2.6.32.16...I've been experiencing issues where upon live migration, the domU simply hangs once it gets resumed on the target dom0. I've been unable to get any crash information out of the domU, nothing comes up in xm dmesg. There could be a kernel panic happening but since I can't connect to the console during the migration I haven't been able to get anything useful. Comparing a successful migration to a failed one in the xend.log and xen-debug.log, nothing stands out as being different. Testing a wide variety of VM's to see why some worked and some didn't, I've narrowed it down to the domU kernel version and down to 2.6.32.16 specifically by trying these versions: 2.6.32.8 good 2.6.32.15 good 2.6.32.16 bad 2.6.32.17 bad 2.6.32.20 bad 2.6.32.24 bad 2.6.32.28 bad 2.6.37 bad All are the stock kernel off kernel.org.Note that this isn't consistent at all, I've got 6 dom0's and this only happens when migrating certain directions between certain dom0's: xen1->xen2 crash xen1->xen5 crash xen1->xen6 crash xen2->xen5 crash xen2->xen1 works xen3->xen1 works xen5->xen2 works xen5->xen6 works xen6->xen1 works xen6->xen5 worksPreviously, xen6->5 worked but xen5->6 didn't work. After a few reboots (of the dom0) however the problem between them resolved itself and now I can go xen5->6 and back all day on 2.6.32.16 without issues. If i then migrate it to xen1 it's fine, but back to xen5 and it locks up on resume. All 6 xen dom0's are identical: xen5 ~ # xm info host : xen5 release : 2.6.31.13 version : #11 SMP Wed Jan 26 10:55:28 PST 2011 machine : x86_64 nr_cpus : 12 nr_nodes : 2 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 2266hw_caps : bfebfbff:2c100800:00000000:00001f40:009ee3fd:00000000:00000001:00000000 virt_caps : hvm hvm_directio total_memory : 40950 free_memory : 38380 node_to_cpu : node0:0-5 node1:6-11 node_to_memory : node0:23388 node1:14991 node_to_dma32_mem : node0:2994 node1:0 max_node_id : 1 xen_major : 4 xen_minor : 0 xen_extra : .1-rc6-prexen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailablexen_commandline : console=com1,com2,vga com1=115200,8n1 com2=115200,8n1 dom0_mem=1024M dom0_max_vcpus=1 dom0_vcpus_pin=true cc_compiler : gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) cc_compile_by : root cc_compile_domain : cc_compile_date : Tue Jan 25 17:05:03 PST 2011 xend_config_format : 4I've tried updating to a newer dom0 release but ran into linking issues due to as-needed so I haven't managed to get them up yet. Looking at the changelog for 2.6.32.16 (http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32.16) there were two xen patches made, both involving resuming. Diffs for the two patches: http://git.kernel.org/?p=linux/kernel/git/longterm/linux-2.6.32.y.git;a=commitdiff;h=0f58db21025d979e38db691861985ebc931551b1 http://git.kernel.org/?p=linux/kernel/git/longterm/linux-2.6.32.y.git;a=commitdiff;h=b6d1fd29840e29d1a87d0ab15ee1ccc90ea15ec4I've tried reversing them together and 1 at a time, yet the problem still happens. I then took 2.6.32.15 and applied those patches and it's completely stable. So whatever is causing this was apparently not a xen-related patch? Anyone have any ideas on what might be going on here or how I can debug it further? I'm completely stumped at this point, don't want to just try applying every patch in 2.6.32.16 to see which one is doing it. Compiling + testing all these kernels is time consuming =) Thanks, Nathan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |