Xen project Mailing List

Re: [Xen-users] Xen 4.6 Live Migration and Hotplugging Issues

To: Tim Evers <xen-user-ml@xxxxxxxxxxx>, xen-users@xxxxxxxxxxxxx

From: Dongli Zhang <dongli.zhang@xxxxxxxxxx>

Date: Tue, 31 Oct 2017 12:23:08 +0800

Delivery-date: Tue, 31 Oct 2017 04:25:02 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

About the cpu hotplug issue, I am able to reproduce it as well. I think the lockup is due to the following code in xen_cpu_up() (arch/x86/xen/smp.c) as it is spinning until cpu_hotplug_state of new vcpu is CPU_ONLINE. while (cpu_report_state(cpu) != CPU_ONLINE) HYPERVISOR_sched_op(SCHEDOP_yield, NULL); cpu_hotplug_state is set to CPU_ONLINE with cpu_set_state_online(). Have you tried with the latest mainline linux? As far as I remember, I have tried with latest mainline linux and got warning related to block-mq when I online vcpu. I am not sure if below patch would help: 1 commit ae039001054b34c4a624539b32a8b6ff3403aaf9 2 Author: Ankur Arora <ankur.a.arora@xxxxxxxxxx> 3 Date: Fri Jun 2 17:06:02 2017 -0700 4 5 xen/vcpu: Handle xen_vcpu_setup() failure at boot 6 7 On PVH, PVHVM, at failure in the VCPUOP_register_vcpu_info hypercall 8 we limit the number of cpus to to MAX_VIRT_CPUS. However, if this 9 failure had occurred for a cpu beyond MAX_VIRT_CPUS, we continue 10 to function with > MAX_VIRT_CPUS. 11 12 This leads to problems at the next save/restore cycle when there 13 are > MAX_VIRT_CPUS threads going into stop_machine() but coming 14 back up there's valid state for only the first MAX_VIRT_CPUS. 15 16 This patch pulls the excess CPUs down via cpu_down(). 17 18 Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> 19 Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> 20 Signed-off-by: Juergen Gross <jgross@xxxxxxxx> Dongli Zhang On 10/31/2017 12:14 AM, Tim Evers wrote: > Hi, > > I am trying to set up two Ubuntu 16.04 / Xen 4.6 Machines to perform live > migration and CPU / memory hotplug. So far I encountered several catastrophic > issues. They are so severe that I am thinking I might be on the wrong track > alltogether. > > Any input is highly appreciated! > > The setup: > > 2 Dell M630 with Ubuntu 16.04 and Xen 4.6, 64bit Dom0 (node1 + node2) > > 2 Domus, Debian Jessie 64bit PV and Debian Jessie 64bit HVM > > Now create a PV Domu on node1 with 1 CPU Core and 2 GB RAM and plenty of room > for hot-add / hotplug: > > Config excerpt: > > kernel = "/home/xen/shared/boot/tests/vmlinuz-3.16.0-4-amd64" > ramdisk = "/home/xen/shared/boot/tests/initrd.img-3.16.0-4-amd64" > maxmem = 16384 > memory = 2048 > maxvcpus = 8 > vcpus = 1 > cpus = "18" > > xm list: > > root1823 97 2048 1 -b---- 15.1 > > All is fine. Now migrate to node2. Immediately after the migratiion we see: > > xm list: > > root182 360 16384 1 -b---- 10.5 > > So the DomU immediately ballooned to its maxmem after the migration, and even > better, inside the Domu we see all CPUs are suddenly hotplugged (but not > online > due to missing udev rules): > > root@debian8:~# ls /sys/devices/system/cpu/ | grep cpu > cpu0 > cpu1 > cpu2 > cpu3 > cpu4 > cpu5 > cpu6 > cpu7 > > So this is already not how it is supposed to be (DomU should look the same > before and after migration). > > Now we take cpu1 online: > > echo 1 > /sys/devices/system/cpu/cpu1/online > > Result as seen through hvc on the Dom0: > > [ 373.360949] installing Xen timer for CPU 1 > [ 400.032003] BUG: soft lockup - CPU#0 stuck for 22s! [bash:733] > [ 400.032003] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs > lockd fscache sunrpc evdev pcspkr x86_pkg_temp_thermal thermal_sys coretemp > crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper > cryptd > autofs4 ext4 crc16 mbcache jbd2 crct10dif_pclmul crct10dif_common xen_netfront > xen_blkfront crc32c_intel > [ 400.032003] CPU: 0 PID: 733 Comm: bash Not tainted 3.16.0-4-amd64 #1 Debian > 3.16.43-2+deb8u3 > [ 400.032003] task: ffff88000470e1d0 ti: ffff88006acec000 task.ti: > ffff88006acec000 > [ 400.032003] RIP: e030:[<ffffffff810013aa>] [<ffffffff810013aa>] > xen_hypercall_sched_op+0xa/0x20 > [ 400.032003] RSP: e02b:ffff88006acefdd0 EFLAGS: 00000246 > [ 400.032003] RAX: 0000000000000000 RBX: 0000000000000001 RCX: > ffffffff810013aa > [ 400.032003] RDX: ffff88007d640000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 400.032003] RBP: ffff88006bcf6000 R08: ffff88007d03d5c8 R09: > 0000000000000122 > [ 400.032003] R10: 0000000000000000 R11: 0000000000000246 R12: > 0000000000000001 > [ 400.032003] R13: 000000000000cd60 R14: ffff88006d1dca20 R15: > 000000000007d649 > [ 400.032003] FS: 00007fe4b215e700(0000) GS:ffff88007d600000(0000) > knlGS:0000000000000000 > [ 400.032003] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 400.032003] CR2: 00000000016de6d0 CR3: 0000000004a67000 CR4: > 0000000000042660 > [ 400.032003] Stack: > [ 400.032003] ffff88006acefb3e 0000000000000000 ffffffff81010dc1 > 0000000001323d35 > [ 400.032003] 0000000000000000 0000000000000000 0000000000000001 > 0000000000000001 > [ 400.032003] ffff88006d1dca20 0000000000000000 ffffffff81068cac > 000000306aceff3c > [ 400.032003] Call Trace: > [ 400.032003] [<ffffffff81010dc1>] ? xen_cpu_up+0x211/0x500 > [ 400.032003] [<ffffffff81068cac>] ? _cpu_up+0x12c/0x160 > [ 400.032003] [<ffffffff81068d59>] ? cpu_up+0x79/0xa0 > [ 400.032003] [<ffffffff8150b615>] ? cpu_subsys_online+0x35/0x80 > [ 400.032003] [<ffffffff813a608d>] ? device_online+0x5d/0xa0 > [ 400.032003] [<ffffffff813a6145>] ? online_store+0x75/0x80 > [ 400.032003] [<ffffffff8121b56a>] ? kernfs_fop_write+0xda/0x150 > [ 400.032003] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0 > [ 400.032003] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0 > [ 400.032003] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 > [ 400.032003] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc > cc > cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b > 59 > c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > > The same happens on the HVM DomU but always only _after_ live migration. > Hotplugging works flawlessly if done on the Dom0 where the DomU is started on. > > Any idea what might be happening here? Anyone who has managed to migrate and > afterwards hotplug a DomU? > > Thanks > > Tim > > _______________________________________________ > Xen-users mailing list > Xen-users@xxxxxxxxxxxxx > https://lists.xen.org/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.