[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] CPU hangs


  • To: Pasi Kärkkäinen <pasik@xxxxxx>
  • From: "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 9 Sep 2010 12:48:55 -0500
  • Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 09 Sep 2010 10:49:58 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: ActQOgRiggjN8bEwTF+1Ia6NPedfXAADFVjc
  • Thread-topic: [Xen-devel] CPU hangs

In multicpu mode, it takes what appears to be  a random amount of time to hang the whole host.  So I make it happen faster by cutting down the # of CPUs to 1.  When I do this, I usually can get it to happen in < 1hr.  I believe a Windows HVM must be running but can't say that with 100% certainty at this time.  I dont believe the serial port prints in the stack  trace is what is hanging.  I added a serial port to be able to debug the problem.  I think the issue is with the shadow page table.  Of interest may be the fact that these messages are being printed as well
 
>    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
>    that CPU0 is stuck!

 
So my first inclination is to go research the area dealing with VRAM tracking.  It may be getting in a loop causing the crash
 
 
menuentry "Boot Entry 3: debug cpu1" {
    saved_entry=2
    save_env saved_entry
    set root=(NxVG-NxDisk1)
    multiboot   /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle  crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
    module      /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug nmi_watchdog=1
    module      /initrd.img-2.6.32-orc
}


From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
Sent: Thu 9/9/2010 12:13 PM
To: Roger Cruz
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] CPU hangs

On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
>    I am experiencing host hangs with 3.4.2 so I turned on the watchdog and
>    finally got something useful to start tracking.  Before I do, I always
>    like to make sure that this is not something that has already been
>    reported and fixed.  Anyone know of any such CPU deadlocks and a fix?
>
>    Thanks
>

Please paste your grub.conf entry.
When does this hang happen? During startup, or during operation? After how much uptime?

ns16550 sounds like a serial port to me..

-- Pasi

>    (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer detects
>    that CPU0 is stuck!
>    (XEN) ----[ Xen-3.4.2  x86_64  debug=n  Tainted:    C ]----
>    (XEN) CPU:    0
>    (XEN) RIP:    e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    (XEN) RFLAGS: 0000000000000006   CONTEXT: hypervisor
>    (XEN) rax: 0000000000000000   rbx: ffff828c801ef260   rcx:
>    0000000000000001
>    (XEN) rdx: 0000000000002005   rsi: 0000000000000020   rdi:
>    ffff828c801ef260
>    (XEN) rbp: 0000000000000020   rsp: ffff828c8024faa0   r8:
>    0000000000004000
>    (XEN) r9:  0000000000003fff   r10: ffff828c80268360   r11:
>    0000000000000400
>    (XEN) r12: ffff828c801ef2dc   r13: 0000000000000020   r14:
>    ffff828c80267ecc
>    (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4:
>    00000000000026f0
>    (XEN) cr3: 00000000a17ea000   cr2: 0000000097a20000
>    (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>    (XEN) Xen stack trace from rsp=ffff828c8024faa0:
>    (XEN)    ffff828c80127776 ffff828c801ef260 0000000000000000
>    ffff828c801ef2dc
>    (XEN)    ffff828c80127e00 0000000800000000 0000000000000086
>    0000000000000400
>    (XEN)    ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
>    00000000000f1161
>    (XEN)    0000000000000000 ffff8300b781c000 ffff828c80126019
>    0000000000000286
>    (XEN)    ffff828c8012662e 0000003000000030 ffff828c8024fc18
>    ffff828c8024fb48
>    (XEN)    ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
>    0000000000000435
>    (XEN)    0000000000000002 00000000000f1161 000000000006018a
>    ffff8300b781c000
>    (XEN)    ffff8300b75da000 0000000400000000 ffff8180006022b0
>    0000000078e31023
>    (XEN)    0000000078e31021 0000000000078e31 ffff8180006022b0
>    ffff8180006022b0
>    (XEN)    ffff828c801b4870 0000000000000000 ffff828400c03160
>    0000000000000000
>    (XEN)    ffff8300a08a4b08 0000000000000000 000000006018a023
>    ffff8300a08a4b08
>    (XEN)    0000000000000000 000000006018a023 ffff828c801b4839
>    ffff8300b75da000
>    (XEN)    ffff828c00000001 ffffffffffffffff 000000000006018a
>    0000000000000000
>    (XEN)    00000001801b7221 00000000a08a4b08 00000000000a08a4
>    0000000078e32061
>    (XEN)    ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
>    ffff8300b7801b08
>    (XEN)    ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
>    ffff828c80228740
>    (XEN)    ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
>    ffff828c8024ff28
>    (XEN)    ffff828c8024fce4 0000000000000000 0000000000000000
>    0000000000000000
>    (XEN)    0000000100000100 ffff828400f1c640 0000000000f1c640
>    0000000000078e32
>    (XEN)    ffff8300b75da000 ffff828400f1c640 00000000000b7801
>    0000000000000000
>    (XEN) Xen call trace:
>    (XEN)    [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    (XEN)    [<ffff828c80127776>] __serial_putc+0x86/0x180
>    (XEN)    [<ffff828c80127e00>] serial_puts+0x90/0x120
>    (XEN)    [<ffff828c80126019>] __putstr+0x9/0xa0
>    (XEN)    [<ffff828c8012662e>] printk+0xee/0x1d0
>    (XEN)    [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
>    (XEN)    [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
>    (XEN)    [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
>    (XEN)    [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
>    (XEN)    [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
>    (XEN)    [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
>    (XEN)    [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
>    (XEN)    [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
>    (XEN)    [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
>    (XEN)    [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
>    (XEN)    [<ffff828c80127500>] ns16550_poll+0x0/0xa0
>    (XEN)    [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
>    (XEN)    [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
>    (XEN)    [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
>    (XEN)    [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
>    (XEN)    [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 0:
>    (XEN) FATAL TRAP: vector = 2 (nmi)
>    (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>    (XEN) ****************************************
>    (XEN)

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10 02:34:00

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.