[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen host crash






On Thu, Aug 29, 2013 at 9:55 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 29/08/13 17:22, Rushikesh Jadhav wrote:
Hi People,

I had a crash of Xen [ 3.4.2 ] host today and the crash log was dumped in /var/crash/

While analyzing the crash log & call trace on this 24 PCPU host, I found that some of PCPUs were in idle state & many were having same call trace as 

PCPU7
Call Trace:
 [ffff828c8010e310] dump_domains+0x4d0
  ffff828c80175b7c  crash_nmi_callback+0x2c
  ffff828c8015f2f9  do_nmi+0x39
  ffff828c801d6877  handle_ist_exception+0x52
  ffff828c801787b2  acpi_safe_halt+0x2


Only one PCPU has got call trace as

PCPU6
Call Trace:
 [ffff828c8010e310] dump_domains+0x4d0
  ffff828c8010eeb7  kexec_crash+0x57
  ffff828c80127b36  panic+0x136
  ffff828c8011b7da  __print_symbol+0x8a
  ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
  ffff828c80100000  __per_cpu_shift+0x800ffff4
  ffff828c8015eb75  show_stack+0x155
  ffff828c8015eeba  fatal_trap+0x6a
  ffff828c801567a1  nmi_watchdog_tick+0x131
  ffff828c8015f37f  do_nmi+0xbf
  ffff828c801d6877  handle_ist_exception+0x52
  ffff828c8011ab02  _spin_lock+0x12

Can anyone please help me understand this & try to find out crash cause ?
There are no error logs in messages or kernel at the time of crash.

I checked for C-States and it is set to 2.

Thanks.

This is a spinlock deadlock, resulting in the NMI watchdog timing out and killing the host.  Do you have Stack and register dump for PCPU6 ?

~Andrew

Hi Andrew, here is the stack trace and register dump for PCPU6 & PCPU7

PCPU6 host state:
RIP:    e008:[<ffff828c8010e310>]
RFLAGS: 0000000000000002
rax: 0000000000000004   rbx: 0000000000000001   rcx: ffff828c803629cc
rdx: ffff828c8036286c   rsi: ffff828c803628dc   rdi: 00000000ffffffff
rbp: 0000000000000082   rsp: ffff83247fd88e10   r8:  0000000000000001
r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
r12: 0000000000000001   r13: ffff832270af39a0   r14: 0000000000000002
r15: 0000000000000009
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000205a12e000   cr2: fffff880005c5000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

current: DOM185 VCPU6 (ffff830047ee8000)
stack context: DOM185 VCPU6 (ffff830047ee8000)
idle VCPU: ffff83007ea5e000

Stack at 0xffff83247fd88e10: 
 ffff83247fd88e00:                                     8010eeb7 ffff828c 801fb1d0 ffff828c
 ffff83247fd88e20: 80127b36 ffff828c 00000028 00000030 7fd88f18 ffff8324 7fd88e48 ffff8324
 ffff83247fd88e40: 00000001 00000000 00000002 00000000 00000002 00000000 801f423d ffff828c
 ffff83247fd88e60: 00000000 00000000 801f45b3 ffff828c 00000000 00000000 00000096 00000000
 ffff83247fd88e80: 8011b7da ffff828c 000000e5 00000000 00000000 00000000 8019b4ab ffff828c
 ffff83247fd88ea0: 7fd8ff20 ffff8324 80100000 ffff828c 8015eb75 ffff828c 7fd88f58 ffff8324
 ffff83247fd88ec0: 7fd8ff28 ffff8324 7fd88f58 ffff8324 00000002 00000000 7fd8ff28 ffff8324
 ffff83247fd88ee0: 7fd88f58 ffff8324 00000002 00000000 8015eeba ffff828c 7fd8ff28 ffff8324
 ffff83247fd88f00: 7fd8ff28 ffff8324 7fd88f58 ffff8324 801567a1 ffff828c 00000000 00000000
 ffff83247fd88f20: 00000006 00000000 7fd88f58 ffff8324 8015f37f ffff828c 00000000 00000000
 ffff83247fd88f40: 339e8000 ffff8322 7fd8ff28 ffff8324 801d6877 ffff828c 00000009 00000000
 ffff83247fd88f60: 00000002 00000000 70af39a0 ffff8322 00000001 00000000 7fd8ff28 ffff8324
 ffff83247fd88f80: 339e8000 ffff8322 00ff00ff 00ff00ff 0000ffff 0000ffff 339e8018 ffff8322
 ffff83247fd88fa0: 00000002 00000000 00000000 00000000 0000000f 00000000 47ee8000 ffff8300
 ffff83247fd88fc0: 00366807 00000000 60e72e90 ffff8319 00000000 00000002 8011ab02 ffff828c
 ffff83247fd88fe0: 0000e008 00000000 00000246 00000000 7fd8fd70 ffff8324 00000000 00000000

Code:
 da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c> 89 3f 4c 89 77 08 4c 89 6f 10 

Call Trace:
 [ffff828c8010e310] dump_domains+0x4d0
  ffff828c8010eeb7  kexec_crash+0x57
  ffff828c80127b36  panic+0x136
  ffff828c8011b7da  __print_symbol+0x8a
  ffff828c8019b4ab  vmx_asm_vmexit_handler+0x6b
  ffff828c80100000  __per_cpu_shift+0x800ffff4
  ffff828c8015eb75  show_stack+0x155
  ffff828c8015eeba  fatal_trap+0x6a
  ffff828c801567a1  nmi_watchdog_tick+0x131
  ffff828c8015f37f  do_nmi+0xbf
  ffff828c801d6877  handle_ist_exception+0x52
  ffff828c8011ab02  _spin_lock+0x12

  PCPU6 guest state:
DOMAIN185 VCPU3
RIP:    0000:[<fffff800016caee0>]
RFLAGS: 0000000000010206
rax: 0000000000000000   rbx: ffffffffffffffff   rcx: fffffa6002998000
rdx: 0000000000000100   rsi: fffff6fd30014d00   rdi: 0000000000000010
rbp: 0000000000000010   rsp: fffffa6001bc6c48   r8:  0000000000000000
r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
r15: 0000000000000000
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000205a12e000   cr2: fffff880005c5000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0000

VCPU pause flags: 0 arch flags 0x1

current on PCPU6
struct vcpu at ffff830047ee8000

Stack unavailable.


  PCPU7 host state:
RIP:    e008:[<ffff828c8010e310>]
RFLAGS: 0000000000000002
rax: 0000000000000004   rbx: 0000000000000007   rcx: ffff828c80362bc0
rdx: ffff828c80362a60   rsi: ffff828c80362ad0   rdi: ffff83247fd78f58
rbp: ffff83247fd78f58   rsp: ffff83247fd78f20   r8:  000000000000234c
r9:  0000000000000002   r10: 0000000000000000   r11: 0000000000000000
r12: ffff8320f16a6230   r13: ffff83247fdefea8   r14: 0017ca4b87bb89a4
r15: ffff828c8024b100
cr0: 0000000080050033   cr4: 00000000000026f0
cr3: 000000200aab0000   cr2: 000000001588fff0
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

current: idle (ffff83007ea5c000)
stack context: DOM183 VCPU7 (ffff830052966000)
idle VCPU: ffff83007ea5c000

Stack at 0xffff83247fd78f20: 
 ffff83247fd78f20: 80175b7c ffff828c 7fd78f58 ffff8324 8015f2f9 ffff828c 00000000 00000000
 ffff83247fd78f40: 00ec0bc5 00000000 f16a61d0 ffff8320 801d6877 ffff828c 8024b100 ffff828c
 ffff83247fd78f60: 87bb89a4 0017ca4b 7fdefea8 ffff8324 f16a6230 ffff8320 f16a61d0 ffff8320
 ffff83247fd78f80: 00ec0bc5 00000000 00000000 00000000 00000000 00000000 00000002 00000000
 ffff83247fd78fa0: 0000234c 00000000 00000003 00000000 00000001 00000000 00000808 00000000
 ffff83247fd78fc0: 644f7bdc 00000000 f16a6230 ffff8320 00000000 00000002 801787b2 ffff828c
 ffff83247fd78fe0: 0000e008 00000000 00000246 00000000 7fd7fec0 ffff8324 00000000 00000000

Code:
 da e8 df 91 01 00 e9 19 fd ff ff 90 90 90 90 90 90 90 90 90 90 <4c> 89 3f 4c 89 77 08 4c 89 6f 10 

Call Trace:
 [ffff828c8010e310] dump_domains+0x4d0
  ffff828c80175b7c  crash_nmi_callback+0x2c
  ffff828c8015f2f9  do_nmi+0x39
  ffff828c801d6877  handle_ist_exception+0x52
  ffff828c801787b2  acpi_safe_halt+0x2

  PCPU7 guest state:
None (idle)




 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.