[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] CPU hangs


  • To: Pasi Kärkkäinen <pasik@xxxxxx>
  • From: "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 9 Sep 2010 12:57:29 -0500
  • Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 09 Sep 2010 11:02:24 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: ActQR7lc1rufP0VATpq+zLPcFpgQowAAGe3w
  • Thread-topic: [Xen-devel] CPU hangs

Hi Pasi,

Thank you for answering so quickly.

> Have you tried changing the cpufreq/cpuidle settings?

No.  We have had this work before.  I cant recall exactly why we needed.

> How about the watchdog?

The watchdog is here new in order to cause the stack trace.  Otherwise, it just 
hangs and you cant tell what is going on.

> Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M 
> parameter
> due to the grub2 bug.. so make sure to add dummy=dummy parameter before the 
> dom0_mem.

I fixed this bug in the GRUB2 version we are using, so the parameter is 
correctly passed to Xen now.

R.


-----Original Message-----
From: Pasi Kärkkäinen [mailto:pasik@xxxxxx] 
Sent: Thursday, September 09, 2010 1:52 PM
To: Roger Cruz
Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] CPU hangs

On Thu, Sep 09, 2010 at 12:48:55PM -0500, Roger Cruz wrote:
>    In multicpu mode, it takes what appears to be  a random amount of time to
>    hang the whole host.  So I make it happen faster by cutting down the # of
>    CPUs to 1.  When I do this, I usually can get it to happen in < 1hr.  I
>    believe a Windows HVM must be running but can't say that with 100%
>    certainty at this time.  I dont believe the serial port prints in the
>    stack  trace is what is hanging.  I added a serial port to be able to
>    debug the problem.  I think the issue is with the shadow page table.  Of
>    interest may be the fact that these messages are being printed as well
> 
>    >    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
>    detects
>    >    that CPU0 is stuck!
> 
>    So my first inclination is to go research the area dealing with VRAM
>    tracking.  It may be getting in a loop causing the crash
> 
> 
>    menuentry "Boot Entry 3: debug cpu1" {
>        saved_entry=2
>        save_env saved_entry
>        set root=(NxVG-NxDisk1)
>        multiboot   /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle
>    [1]crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog
>    com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1
>        module      /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro
>    console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug
>    nmi_watchdog=1
>        module      /initrd.img-2.6.32-orc
>    }
> 

Have you tried changing the cpufreq/cpuidle settings? 
How about the watchdog? 

Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M 
parameter
due to the grub2 bug.. so make sure to add dummy=dummy parameter before the 
dom0_mem.

-- Pasi

>    --------------------------------------------------------------------------
> 
>    From: Pasi Kärkkäinen [mailto:pasik@xxxxxx]
>    Sent: Thu 9/9/2010 12:13 PM
>    To: Roger Cruz
>    Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx
>    Subject: Re: [Xen-devel] CPU hangs
> 
>    On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote:
>    >    I am experiencing host hangs with 3.4.2 so I turned on the watchdog
>    and
>    >    finally got something useful to start tracking.  Before I do, I
>    always
>    >    like to make sure that this is not something that has already been
>    >    reported and fixed.  Anyone know of any such CPU deadlocks and a fix?
>    >
>    >    Thanks
>    >
> 
>    Please paste your grub.conf entry.
>    When does this hang happen? During startup, or during operation? After how
>    much uptime?
> 
>    ns16550 sounds like a serial port to me..
> 
>    -- Pasi
> 
>    >    (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte
>    >    (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer
>    detects
>    >    that CPU0 is stuck!
>    >    (XEN) ----[ Xen-3.4.2  x86_64  debug=n  Tainted:    C ]----
>    >    (XEN) CPU:    0
>    >    (XEN) RIP:    e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    >    (XEN) RFLAGS: 0000000000000006   CONTEXT: hypervisor
>    >    (XEN) rax: 0000000000000000   rbx: ffff828c801ef260   rcx:
>    >    0000000000000001
>    >    (XEN) rdx: 0000000000002005   rsi: 0000000000000020   rdi:
>    >    ffff828c801ef260
>    >    (XEN) rbp: 0000000000000020   rsp: ffff828c8024faa0   r8:
>    >    0000000000004000
>    >    (XEN) r9:  0000000000003fff   r10: ffff828c80268360   r11:
>    >    0000000000000400
>    >    (XEN) r12: ffff828c801ef2dc   r13: 0000000000000020   r14:
>    >    ffff828c80267ecc
>    >    (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4:
>    >    00000000000026f0
>    >    (XEN) cr3: 00000000a17ea000   cr2: 0000000097a20000
>    >    (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>    >    (XEN) Xen stack trace from rsp=ffff828c8024faa0:
>    >    (XEN)    ffff828c80127776 ffff828c801ef260 0000000000000000
>    >    ffff828c801ef2dc
>    >    (XEN)    ffff828c80127e00 0000000800000000 0000000000000086
>    >    0000000000000400
>    >    (XEN)    ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40
>    >    00000000000f1161
>    >    (XEN)    0000000000000000 ffff8300b781c000 ffff828c80126019
>    >    0000000000000286
>    >    (XEN)    ffff828c8012662e 0000003000000030 ffff828c8024fc18
>    >    ffff828c8024fb48
>    >    (XEN)    ffff828c80267ea6 0000000000000000 ffff828c801e3b9c
>    >    0000000000000435
>    >    (XEN)    0000000000000002 00000000000f1161 000000000006018a
>    >    ffff8300b781c000
>    >    (XEN)    ffff8300b75da000 0000000400000000 ffff8180006022b0
>    >    0000000078e31023
>    >    (XEN)    0000000078e31021 0000000000078e31 ffff8180006022b0
>    >    ffff8180006022b0
>    >    (XEN)    ffff828c801b4870 0000000000000000 ffff828400c03160
>    >    0000000000000000
>    >    (XEN)    ffff8300a08a4b08 0000000000000000 000000006018a023
>    >    ffff8300a08a4b08
>    >    (XEN)    0000000000000000 000000006018a023 ffff828c801b4839
>    >    ffff8300b75da000
>    >    (XEN)    ffff828c00000001 ffffffffffffffff 000000000006018a
>    >    0000000000000000
>    >    (XEN)    00000001801b7221 00000000a08a4b08 00000000000a08a4
>    >    0000000078e32061
>    >    (XEN)    ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8
>    >    ffff8300b7801b08
>    >    (XEN)    ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306
>    >    ffff828c80228740
>    >    (XEN)    ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8
>    >    ffff828c8024ff28
>    >    (XEN)    ffff828c8024fce4 0000000000000000 0000000000000000
>    >    0000000000000000
>    >    (XEN)    0000000100000100 ffff828400f1c640 0000000000f1c640
>    >    0000000000078e32
>    >    (XEN)    ffff8300b75da000 ffff828400f1c640 00000000000b7801
>    >    0000000000000000
>    >    (XEN) Xen call trace:
>    >    (XEN)    [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30
>    >    (XEN)    [<ffff828c80127776>] __serial_putc+0x86/0x180
>    >    (XEN)    [<ffff828c80127e00>] serial_puts+0x90/0x120
>    >    (XEN)    [<ffff828c80126019>] __putstr+0x9/0xa0
>    >    (XEN)    [<ffff828c8012662e>] printk+0xee/0x1d0
>    >    (XEN)    [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0
>    >    (XEN)    [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0
>    >    (XEN)    [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0
>    >    (XEN)    [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0
>    >    (XEN)    [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450
>    >    (XEN)    [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550
>    >    (XEN)    [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390
>    >    (XEN)    [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0
>    >    (XEN)    [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0
>    >    (XEN)    [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0
>    >    (XEN)    [<ffff828c80127500>] ns16550_poll+0x0/0xa0
>    >    (XEN)    [<ffff828c80138f62>] reprogram_timer+0x62/0xa0
>    >    (XEN)    [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110
>    >    (XEN)    [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60
>    >    (XEN)    [<ffff828c80198715>] vmx_intr_assist+0x55/0x190
>    >    (XEN)    [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd
>    >    (XEN)
>    >    (XEN)
>    >    (XEN) ****************************************
>    >    (XEN) Panic on CPU 0:
>    >    (XEN) FATAL TRAP: vector = 2 (nmi)
>    >    (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>    >    (XEN) ****************************************
>    >    (XEN)
> 
>    > _______________________________________________
>    > Xen-devel mailing list
>    > Xen-devel@xxxxxxxxxxxxxxxxxxx
>    > [2]http://lists.xensource.com/xen-devel
> 
>    No virus found in this incoming message.
>    Checked by AVG - www.avg.com
>    Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10
>    02:34:00
> 
> References
> 
>    Visible links
>    1. mailto:crashkernel=128M@16m
>    2. http://lists.xensource.com/xen-devel

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10 
02:34:00

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.