[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HVM bug: system crashes after offline online a vcpu
On Thu, Dec 13, 2012 at 03:12:17PM +0000, Wei Liu wrote: > Hi Konrad > > I encountered a bug when trying to bring offline a cpu then online it > again in HVM. As I'm not very familiar with HVM stuffs I cannot come up > with a quick fix. I took your two patches that you posted and they are in v3.8 now. It seems that there are bugs in the offline/online code thought. I did this: # echo 0 > /sys/devices/system/cpu/cpu3/online # echo 1 > /sys/devices/system/cpu/cpu3/online With a PV guest and it blows up (with or without your patches). Have you seen something similar to this: [ 106.166795] BUG: scheduling while atomic: swapper/2/0/0x00000000 [ 106.167168] microcode: CPU2 sig=0x206a7, pf=0x2, revision=0x17 [ 106.167566] Modules linked in: sg sd_mod dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod libcrc32c crc32c radeon fbcon tileblit font bitblit softcursor ttm drm_kms_helper crc32c_intel xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs xen_privcmd [last unloaded: dump_dma] [ 106.169286] Pid: 0, comm: swapper/2 Tainted: G O 3.5.0-rc3upstream-00139-gb1849b3-dirty #1 [ 106.170152] Call Trace: [ 106.170598] [<ffffffff8109bcbd>] __schedule_bug+0x4d/0x60 [ 106.171042] [<ffffffff815be0fc>] __schedule+0x69c/0x760 [ 106.171469] [<ffffffff815be284>] schedule+0x24/0x70 [ 106.171890] [<ffffffff8103fbe9>] cpu_idle+0xc9/0xe0 [ 106.172309] [<ffffffff81033e79>] ? xen_irq_enable_direct_reloc+0x4/0x4 [ 106.172726] [<ffffffff815b1c5d>] cpu_bringup_and_idle+0xe/0x10 [ 106.174533] BUG: scheduling while atomic: swapper/2/0/0x00000000 ? > > The HVM DomU is configured with 4 vcpus. After booting into command > prompt, I do following operations. > > > With Debian's default 2.6.32-5-amd64 kernel, the last log is: > > Booting processor 3 APIC 0x6 ip 0x6000 > > With my own kernel which is of version 3.5, I'm able to get more logs: > > [ 44.047358] Booting Node 0 Processor 3 APIC 0x6 > [ 44.061201] ------------[ cut here ]------------ > [ 44.065186] kernel BUG at kernel/hrtimer.c:1259! > [ 44.065186] invalid opcode: 0000 [#1] SMP > [ 44.065186] CPU 3 > [ 44.065186] Modules linked in: > [ 44.065186] > [ 44.065186] Pid: 0, comm: swapper/3 Not tainted 3.5.0-xen-evtchn+ #50 Xen > HVM domU > [ 44.065186] RIP: 0010:[<ffffffff8105682e>] [<ffffffff8105682e>] > hrtimer_interrupt+0x24/0x1a5 > [ 44.065186] RSP: 0000:ffff88000f463de8 EFLAGS: 00010046 > [ 44.065186] RAX: ffffffff8105680a RBX: ffff88000f46e640 RCX: > 00000000fffffffa > [ 44.065186] RDX: 00000000fffffffa RSI: 0000000000000000 RDI: > ffff88000f46bd80 > [ 44.065186] RBP: 0000000000000057 R08: ffff88000e000b40 R09: > 0000000000000019 > [ 44.065186] R10: 0000000000000000 R11: 0000000000000001 R12: > ffff88000e6e8e00 > [ 44.065186] R13: 0000000000000000 R14: 0000000000000001 R15: > 0000000000000000 > [ 44.065186] FS: 0000000000000000(0000) GS:ffff88000f460000(0000) > knlGS:0000000000000000 > [ 44.065186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 44.065186] CR2: 0000000000000000 CR3: 000000000181b000 CR4: > 00000000000007e0 > [ 44.065186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 44.065186] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 44.065186] Process swapper/3 (pid: 0, threadinfo ffff88000e62e000, task > ffff88000e62aea0) > [ 44.065186] Stack: > [ 44.065186] 0000000000000001 ffff88000f46e680 ffffffff81013711 > 00000008cfba9b27 > [ 44.065186] 00000000fffffffa ffff88000e6e97c0 0000000000000057 > ffff88000e6e8e00 > [ 44.065186] 0000000000000000 0000000000000001 0000000000000000 > ffffffff81006954 > [ 44.065186] Call Trace: > [ 44.065186] <IRQ> > [ 44.065186] [<ffffffff81013711>] ? paravirt_sched_clock+0x5/0x8 > [ 44.065186] [<ffffffff81006954>] ? xen_timer_interrupt+0x26/0x162 > [ 44.065186] [<ffffffff8109a220>] ? > check_for_new_grace_period.isra.32+0x90/0x9a > [ 44.065186] [<ffffffff810956df>] ? handle_irq_event_percpu+0x32/0x1b0 > [ 44.065186] [<ffffffff8128f88b>] ? irq_get_handler_data+0x7/0x16 > [ 44.065186] [<ffffffff81097e39>] ? handle_percpu_irq+0x3a/0x4f > [ 44.065186] [<ffffffff8128f9ec>] ? __xen_evtchn_do_upcall_l2+0x131/0x1c0 > [ 44.065186] [<ffffffff812913d3>] ? xen_evtchn_do_upcall+0x27/0x37 > [ 44.065186] [<ffffffff8140081a>] ? xen_hvm_callback_vector+0x6a/0x70 > [ 44.065186] <EOI> > [ 44.065186] [<ffffffff81094b8f>] ? cpumask_next+0x17/0x19 > [ 44.065186] [<ffffffff813eb75b>] ? start_secondary+0x184/0x1e2 > [ 44.065186] [<ffffffff813eb757>] ? start_secondary+0x180/0x1e2 > [ 44.065186] [<ffffffff813eb5d7>] ? set_cpu_sibling_map+0x40e/0x40e > [ 44.065186] Code: 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 c7 > c3 40 e6 00 00 48 83 ec 28 65 48 03 1c 25 e8 db 00 00 83 7b 18 00 75 02 <0f> > 0b 48 > ff 43 20 48 bd ff ff ff ff ff ff ff 7f 41 be 03 00 00 > [ 44.065186] RIP [<ffffffff8105682e>] hrtimer_interrupt+0x24/0x1a5 > [ 44.065186] RSP <ffff88000f463de8> > [ 44.065186] ---[ end trace 9366352b116a03db ]--- > [ 44.065186] Kernel panic - not syncing: Fatal exception in interrupt > > And if I offline online cpu 2 in 2.6.32-5-amd64: > > [ 27.933928] Booting processor 2 APIC 0x4 ip 0x6000 > [ 25.708098] Initializing CPU#2 > [ 25.708098] CPU: L1 I cache: 32K, L1 D cache: 32K > [ 25.708098] CPU: L2 cache: 6144K > [ 25.708098] CPU 2/0x4 -> Node 0 > [ 25.708098] CPU: Physical Processor ID: 0 > [ 25.708098] CPU: Processor Core ID: 4 > [ 28.028234] CPU2: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz stepping > 07 > [ 28.069320] checking TSC synchronization [CPU#0 -> CPU#2]: passed. > [ 25.708098] installing Xen timer for CPU 2 > [ 28.098101] CPU0 attaching NULL sched-domain. > [ 28.098106] CPU1 attaching NULL sched-domain. > [ 28.098110] CPU3 attaching NULL sched-domain. > [ 28.098092] ------------[ cut here ]------------ > [ 28.098092] WARNING: at > /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_none/kernel/irq/chip.c:88 > unbind_from_irq+0 > x147/0x159() > [ 28.098092] Hardware name: HVM domU > [ 28.144127] CPU0 attaching sched-domain: > [ 28.144131] domain 0: span 0-3 level CPU > [ 28.144133] groups: 0 1 2 3 > [ 28.144139] CPU1 attaching sched-domain: > [ 28.144142] domain 0: span 0-3 level CPU > [ 28.144145] groups: 1 2 3 0 > [ 28.144150] CPU2 attaching sched-domain: > [ 28.144152] domain 0: span 0-3 level CPU > [ 28.144155] groups: 2 3 0 1 > [ 28.144160] CPU3 attaching sched-domain: > [ 28.144162] domain 0: span 0-3 level CPU > [ 28.144165] groups: 3 0 1 2 > [ 28.209159] Destroying IRQ18 without calling free_irq > [ 28.215985] Modules linked in: loop parport_pc parport psmouse evdev > serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr i2c_piix4 > i2c_core butto > n processor ext3 jbd mbcache ata_generic ata_piix libata floppy thermal > thermal_sys xen_blkfront scsi_mod [last unloaded: scsi_wait_scan] > [ 28.224050] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1 > [ 28.224050] Call Trace: > [ 28.224050] [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159 > [ 28.224050] [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159 > [ 28.224050] [<ffffffff8104dd7c>] ? warn_slowpath_common+0x77/0xa3 > [ 28.224050] [<ffffffff8104de04>] ? warn_slowpath_fmt+0x51/0x59 > [ 28.224050] [<ffffffff810e4493>] ? get_partial_node+0x15/0x85 > [ 28.224050] [<ffffffff811966fd>] ? kvasprintf+0x41/0x68 > [ 28.224050] [<ffffffff8109639e>] ? dynamic_irq_cleanup_x+0x4b/0xc2 > [ 28.224050] [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159 > [ 28.224050] [<ffffffff811ef5b7>] ? bind_virq_to_irqhandler+0x14c/0x15d > [ 28.224050] [<ffffffff8100df77>] ? xen_timer_interrupt+0x0/0x18d > [ 28.224050] [<ffffffff812f5121>] ? set_cpu_sibling_map+0x2f4/0x311 > [ 28.224050] [<ffffffff8100df0d>] ? xen_setup_timer+0x55/0xa2 > [ 28.224050] [<ffffffff8100df71>] ? xen_hvm_setup_cpu_clockevents+0x17/0x1d > [ 28.224050] [<ffffffff812f52fc>] ? start_secondary+0x17c/0x185 > [ 28.224050] ---[ end trace db1493923b5e103d ]--- > > The logs for cpu 2 in my 3.5 kernel is identical to those for cpu 3. > > > Wei. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |