[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] CPU hangs
On Thu, Sep 09, 2010 at 12:48:55PM -0500, Roger Cruz wrote: > In multicpu mode, it takes what appears to be a random amount of time to > hang the whole host. So I make it happen faster by cutting down the # of > CPUs to 1. When I do this, I usually can get it to happen in < 1hr. I > believe a Windows HVM must be running but can't say that with 100% > certainty at this time. I dont believe the serial port prints in the > stack trace is what is hanging. I added a serial port to be able to > debug the problem. I think the issue is with the shadow page table. Of > interest may be the fact that these messages are being printed as well > > > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer > detects > > that CPU0 is stuck! > > So my first inclination is to go research the area dealing with VRAM > tracking. It may be getting in a loop causing the crash > > > menuentry "Boot Entry 3: debug cpu1" { > saved_entry=2 > save_env saved_entry > set root=(NxVG-NxDisk1) > multiboot /xen.gz dom0_mem=1024MB cpufreq=xen cpuidle > [1]crashkernel=128M@16M vga=text-80x60,keep sync_console noreboot watchdog > com1=115200,8n1,magic console=com1 loglvl=all guest_loglvl=all maxcpus=1 > module /vmlinuz-2.6.32-orc root=/dev/mapper/NxVG-NxDisk5 ro > console=ttyS0,115200,8n1 xencons=ttyS earlyprintk=xen initcall_debug debug > nmi_watchdog=1 > module /initrd.img-2.6.32-orc > } > Have you tried changing the cpufreq/cpuidle settings? How about the watchdog? Also if you're using Xen 3.4.2 I believe you'll lose the dom0_mem=1024M parameter due to the grub2 bug.. so make sure to add dummy=dummy parameter before the dom0_mem. -- Pasi > -------------------------------------------------------------------------- > > From: Pasi Kärkkäinen [mailto:pasik@xxxxxx] > Sent: Thu 9/9/2010 12:13 PM > To: Roger Cruz > Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] CPU hangs > > On Thu, Sep 09, 2010 at 10:53:20AM -0500, Roger Cruz wrote: > > I am experiencing host hangs with 3.4.2 so I turned on the watchdog > and > > finally got something useful to start tracking. Before I do, I > always > > like to make sure that this is not something that has already been > > reported and fixed. Anyone know of any such CPU deadlocks and a fix? > > > > Thanks > > > > Please paste your grub.conf entry. > When does this hang happen? During startup, or during operation? After how > much uptime? > > ns16550 sounds like a serial port to me.. > > -- Pasi > > > (XEN) multi.c:1077:d2 gfn f1159 (mfn 60192) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115a (mfn 60191) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115b (mfn 60190) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115c (mfn 6018f) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115d (mfn 6018e) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115e (mfn 6018d) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f115f (mfn 6018c) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f1160 (mfn 6018b) cleared vram pte > > (XEN) multi.c:1077:d2 gfn f1161 (mfn 6018a)(XEN) Watchdog timer > detects > > that CPU0 is stuck! > > (XEN) ----[ Xen-3.4.2 x86_64 debug=n Tainted: C ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30 > > (XEN) RFLAGS: 0000000000000006 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff828c801ef260 rcx: > > 0000000000000001 > > (XEN) rdx: 0000000000002005 rsi: 0000000000000020 rdi: > > ffff828c801ef260 > > (XEN) rbp: 0000000000000020 rsp: ffff828c8024faa0 r8: > > 0000000000004000 > > (XEN) r9: 0000000000003fff r10: ffff828c80268360 r11: > > 0000000000000400 > > (XEN) r12: ffff828c801ef2dc r13: 0000000000000020 r14: > > ffff828c80267ecc > > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: > > 00000000000026f0 > > (XEN) cr3: 00000000a17ea000 cr2: 0000000097a20000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff828c8024faa0: > > (XEN) ffff828c80127776 ffff828c801ef260 0000000000000000 > > ffff828c801ef2dc > > (XEN) ffff828c80127e00 0000000800000000 0000000000000086 > > 0000000000000400 > > (XEN) ffff828c80267ea6 ffff828c80267edc ffff828c8024fb40 > > 00000000000f1161 > > (XEN) 0000000000000000 ffff8300b781c000 ffff828c80126019 > > 0000000000000286 > > (XEN) ffff828c8012662e 0000003000000030 ffff828c8024fc18 > > ffff828c8024fb48 > > (XEN) ffff828c80267ea6 0000000000000000 ffff828c801e3b9c > > 0000000000000435 > > (XEN) 0000000000000002 00000000000f1161 000000000006018a > > ffff8300b781c000 > > (XEN) ffff8300b75da000 0000000400000000 ffff8180006022b0 > > 0000000078e31023 > > (XEN) 0000000078e31021 0000000000078e31 ffff8180006022b0 > > ffff8180006022b0 > > (XEN) ffff828c801b4870 0000000000000000 ffff828400c03160 > > 0000000000000000 > > (XEN) ffff8300a08a4b08 0000000000000000 000000006018a023 > > ffff8300a08a4b08 > > (XEN) 0000000000000000 000000006018a023 ffff828c801b4839 > > ffff8300b75da000 > > (XEN) ffff828c00000001 ffffffffffffffff 000000000006018a > > 0000000000000000 > > (XEN) 00000001801b7221 00000000a08a4b08 00000000000a08a4 > > 0000000078e32061 > > (XEN) ffff830078e32b08 ffff8300a08a4b10 ffff830078e32ff8 > > ffff8300b7801b08 > > (XEN) ffff828c8024fcd8 ffff8300b75da000 ffff828c801b6306 > > ffff828c80228740 > > (XEN) ffff8300b7801000 00000000000a08a4 ffff828c8024fcc8 > > ffff828c8024ff28 > > (XEN) ffff828c8024fce4 0000000000000000 0000000000000000 > > 0000000000000000 > > (XEN) 0000000100000100 ffff828400f1c640 0000000000f1c640 > > 0000000000078e32 > > (XEN) ffff8300b75da000 ffff828400f1c640 00000000000b7801 > > 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [<ffff828c80126f98>] ns16550_tx_empty+0x28/0x30 > > (XEN) [<ffff828c80127776>] __serial_putc+0x86/0x180 > > (XEN) [<ffff828c80127e00>] serial_puts+0x90/0x120 > > (XEN) [<ffff828c80126019>] __putstr+0x9/0xa0 > > (XEN) [<ffff828c8012662e>] printk+0xee/0x1d0 > > (XEN) [<ffff828c801b4870>] shadow_set_l1e+0x490/0x4e0 > > (XEN) [<ffff828c801b4839>] shadow_set_l1e+0x459/0x4e0 > > (XEN) [<ffff828c801b6306>] sh_resync_l1__guest_3+0x156/0x1c0 > > (XEN) [<ffff828c801aacee>] _sh_resync+0x1be/0x1d0 > > (XEN) [<ffff828c801ac03c>] sh_resync_all+0x3bc/0x450 > > (XEN) [<ffff828c8019d254>] vmx_msr_write_intercept+0x134/0x550 > > (XEN) [<ffff828c801ad8a7>] sh_update_paging_modes+0xd7/0x390 > > (XEN) [<ffff828c801ae624>] shadow_update_paging_modes+0x74/0xd0 > > (XEN) [<ffff828c80182726>] hvm_set_cr4+0xa6/0xb0 > > (XEN) [<ffff828c8019f272>] vmx_vmexit_handler+0x11f2/0x18d0 > > (XEN) [<ffff828c80127500>] ns16550_poll+0x0/0xa0 > > (XEN) [<ffff828c80138f62>] reprogram_timer+0x62/0xa0 > > (XEN) [<ffff828c8018eedb>] pt_update_irq+0x7b/0x110 > > (XEN) [<ffff828c8018a507>] hvm_vcpu_has_pending_irq+0x37/0x60 > > (XEN) [<ffff828c80198715>] vmx_intr_assist+0x55/0x190 > > (XEN) [<ffff828c801984e3>] vmx_asm_do_vmentry+0x0/0xdd > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) FATAL TRAP: vector = 2 (nmi) > > (XEN) [error_code=0000] , IN INTERRUPT CONTEXT > > (XEN) **************************************** > > (XEN) > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > [2]http://lists.xensource.com/xen-devel > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/09/10 > 02:34:00 > > References > > Visible links > 1. mailto:crashkernel=128M@16m > 2. http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |