[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM bug: system crashes after offline online a vcpu



On Thu, Dec 13, 2012 at 03:12:17PM +0000, Wei Liu wrote:
> Hi Konrad
> 
> I encountered a bug when trying to bring offline a cpu then online it
> again in HVM. As I'm not very familiar with HVM stuffs I cannot come up
> with a quick fix.

I took your two patches that you posted and they are in v3.8 now.

It seems that there are bugs in the offline/online code thought.

I did this:
# echo 0 > /sys/devices/system/cpu/cpu3/online
# echo 1 > /sys/devices/system/cpu/cpu3/online

With a PV guest and it blows up (with or without your patches).

Have you seen something similar to this:

[  106.166795] BUG: scheduling while atomic: swapper/2/0/0x00000000
[  106.167168] microcode: CPU2 sig=0x206a7, pf=0x2, revision=0x17
[  106.167566] Modules linked in: sg sd_mod dm_multipath dm_mod xen_evtchn 
iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod 
libcrc32c crc32c radeon fbcon tileblit font bitblit softcursor ttm 
drm_kms_helper crc32c_intel xen_blkfront xen_netfront xen_fbfront fb_sys_fops 
sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs xen_privcmd [last 
unloaded: dump_dma]
[  106.169286] Pid: 0, comm: swapper/2 Tainted: G           O 
3.5.0-rc3upstream-00139-gb1849b3-dirty #1
[  106.170152] Call Trace:
[  106.170598]  [<ffffffff8109bcbd>] __schedule_bug+0x4d/0x60
[  106.171042]  [<ffffffff815be0fc>] __schedule+0x69c/0x760
[  106.171469]  [<ffffffff815be284>] schedule+0x24/0x70
[  106.171890]  [<ffffffff8103fbe9>] cpu_idle+0xc9/0xe0
[  106.172309]  [<ffffffff81033e79>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  106.172726]  [<ffffffff815b1c5d>] cpu_bringup_and_idle+0xe/0x10
[  106.174533] BUG: scheduling while atomic: swapper/2/0/0x00000000
?

> 
> The HVM DomU is configured with 4 vcpus. After booting into command
> prompt, I do following operations.
> 
> 
> With Debian's default 2.6.32-5-amd64 kernel, the last log is:
> 
>     Booting processor 3 APIC 0x6 ip 0x6000
> 
> With my own kernel which is of version 3.5, I'm able to get more logs:
> 
> [   44.047358] Booting Node 0 Processor 3 APIC 0x6
> [   44.061201] ------------[ cut here ]------------
> [   44.065186] kernel BUG at kernel/hrtimer.c:1259!
> [   44.065186] invalid opcode: 0000 [#1] SMP
> [   44.065186] CPU 3
> [   44.065186] Modules linked in:
> [   44.065186]
> [   44.065186] Pid: 0, comm: swapper/3 Not tainted 3.5.0-xen-evtchn+ #50 Xen 
> HVM domU
> [   44.065186] RIP: 0010:[<ffffffff8105682e>]  [<ffffffff8105682e>] 
> hrtimer_interrupt+0x24/0x1a5
> [   44.065186] RSP: 0000:ffff88000f463de8  EFLAGS: 00010046
> [   44.065186] RAX: ffffffff8105680a RBX: ffff88000f46e640 RCX: 
> 00000000fffffffa
> [   44.065186] RDX: 00000000fffffffa RSI: 0000000000000000 RDI: 
> ffff88000f46bd80
> [   44.065186] RBP: 0000000000000057 R08: ffff88000e000b40 R09: 
> 0000000000000019
> [   44.065186] R10: 0000000000000000 R11: 0000000000000001 R12: 
> ffff88000e6e8e00
> [   44.065186] R13: 0000000000000000 R14: 0000000000000001 R15: 
> 0000000000000000
> [   44.065186] FS:  0000000000000000(0000) GS:ffff88000f460000(0000) 
> knlGS:0000000000000000
> [   44.065186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   44.065186] CR2: 0000000000000000 CR3: 000000000181b000 CR4: 
> 00000000000007e0
> [   44.065186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [   44.065186] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [   44.065186] Process swapper/3 (pid: 0, threadinfo ffff88000e62e000, task 
> ffff88000e62aea0)
> [   44.065186] Stack:
> [   44.065186]  0000000000000001 ffff88000f46e680 ffffffff81013711 
> 00000008cfba9b27
> [   44.065186]  00000000fffffffa ffff88000e6e97c0 0000000000000057 
> ffff88000e6e8e00
> [   44.065186]  0000000000000000 0000000000000001 0000000000000000 
> ffffffff81006954
> [   44.065186] Call Trace:
> [   44.065186]  <IRQ>
> [   44.065186]  [<ffffffff81013711>] ? paravirt_sched_clock+0x5/0x8
> [   44.065186]  [<ffffffff81006954>] ? xen_timer_interrupt+0x26/0x162
> [   44.065186]  [<ffffffff8109a220>] ? 
> check_for_new_grace_period.isra.32+0x90/0x9a
> [   44.065186]  [<ffffffff810956df>] ? handle_irq_event_percpu+0x32/0x1b0
> [   44.065186]  [<ffffffff8128f88b>] ? irq_get_handler_data+0x7/0x16
> [   44.065186]  [<ffffffff81097e39>] ? handle_percpu_irq+0x3a/0x4f
> [   44.065186]  [<ffffffff8128f9ec>] ? __xen_evtchn_do_upcall_l2+0x131/0x1c0
> [   44.065186]  [<ffffffff812913d3>] ? xen_evtchn_do_upcall+0x27/0x37
> [   44.065186]  [<ffffffff8140081a>] ? xen_hvm_callback_vector+0x6a/0x70
> [   44.065186]  <EOI>
> [   44.065186]  [<ffffffff81094b8f>] ? cpumask_next+0x17/0x19
> [   44.065186]  [<ffffffff813eb75b>] ? start_secondary+0x184/0x1e2
> [   44.065186]  [<ffffffff813eb757>] ? start_secondary+0x180/0x1e2
> [   44.065186]  [<ffffffff813eb5d7>] ? set_cpu_sibling_map+0x40e/0x40e
> [   44.065186] Code: 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 c7 
> c3 40 e6 00 00 48 83 ec 28 65 48 03 1c 25 e8 db 00 00 83 7b 18 00 75 02 <0f> 
> 0b 48
>  ff 43 20 48 bd ff ff ff ff ff ff ff 7f 41 be 03 00 00
> [   44.065186] RIP  [<ffffffff8105682e>] hrtimer_interrupt+0x24/0x1a5
> [   44.065186]  RSP <ffff88000f463de8>
> [   44.065186] ---[ end trace 9366352b116a03db ]---
> [   44.065186] Kernel panic - not syncing: Fatal exception in interrupt
> 
> And if I offline online cpu 2 in 2.6.32-5-amd64:
> 
> [   27.933928] Booting processor 2 APIC 0x4 ip 0x6000
> [   25.708098] Initializing CPU#2
> [   25.708098] CPU: L1 I cache: 32K, L1 D cache: 32K
> [   25.708098] CPU: L2 cache: 6144K
> [   25.708098] CPU 2/0x4 -> Node 0
> [   25.708098] CPU: Physical Processor ID: 0
> [   25.708098] CPU: Processor Core ID: 4
> [   28.028234] CPU2: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz stepping 
> 07
> [   28.069320] checking TSC synchronization [CPU#0 -> CPU#2]: passed.
> [   25.708098] installing Xen timer for CPU 2
> [   28.098101] CPU0 attaching NULL sched-domain.
> [   28.098106] CPU1 attaching NULL sched-domain.
> [   28.098110] CPU3 attaching NULL sched-domain.
> [   28.098092] ------------[ cut here ]------------
> [   28.098092] WARNING: at 
> /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_none/kernel/irq/chip.c:88
>  unbind_from_irq+0
> x147/0x159()
> [   28.098092] Hardware name: HVM domU
> [   28.144127] CPU0 attaching sched-domain:
> [   28.144131]  domain 0: span 0-3 level CPU
> [   28.144133]   groups: 0 1 2 3
> [   28.144139] CPU1 attaching sched-domain:
> [   28.144142]  domain 0: span 0-3 level CPU
> [   28.144145]   groups: 1 2 3 0
> [   28.144150] CPU2 attaching sched-domain:
> [   28.144152]  domain 0: span 0-3 level CPU
> [   28.144155]   groups: 2 3 0 1
> [   28.144160] CPU3 attaching sched-domain:
> [   28.144162]  domain 0: span 0-3 level CPU
> [   28.144165]   groups: 3 0 1 2
> [   28.209159] Destroying IRQ18 without calling free_irq
> [   28.215985] Modules linked in: loop parport_pc parport psmouse evdev 
> serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr i2c_piix4 
> i2c_core butto
> n processor ext3 jbd mbcache ata_generic ata_piix libata floppy thermal 
> thermal_sys xen_blkfront scsi_mod [last unloaded: scsi_wait_scan]
> [   28.224050] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
> [   28.224050] Call Trace:
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff8104dd7c>] ? warn_slowpath_common+0x77/0xa3
> [   28.224050]  [<ffffffff8104de04>] ? warn_slowpath_fmt+0x51/0x59
> [   28.224050]  [<ffffffff810e4493>] ? get_partial_node+0x15/0x85
> [   28.224050]  [<ffffffff811966fd>] ? kvasprintf+0x41/0x68
> [   28.224050]  [<ffffffff8109639e>] ? dynamic_irq_cleanup_x+0x4b/0xc2
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff811ef5b7>] ? bind_virq_to_irqhandler+0x14c/0x15d
> [   28.224050]  [<ffffffff8100df77>] ? xen_timer_interrupt+0x0/0x18d
> [   28.224050]  [<ffffffff812f5121>] ? set_cpu_sibling_map+0x2f4/0x311
> [   28.224050]  [<ffffffff8100df0d>] ? xen_setup_timer+0x55/0xa2
> [   28.224050]  [<ffffffff8100df71>] ? xen_hvm_setup_cpu_clockevents+0x17/0x1d
> [   28.224050]  [<ffffffff812f52fc>] ? start_secondary+0x17c/0x185
> [   28.224050] ---[ end trace db1493923b5e103d ]---
> 
> The logs for cpu 2 in my 3.5 kernel is identical to those for cpu 3.
> 
> 
> Wei.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.