[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen-4.3 - curious crash
Hello, Last night, XenRT discovered an interesting host crash. The crash itself somewhat concerning, but lack of information does highlight an area which could do with easier debugability. Here is the results from the serial console. The server in question is a Supermicro Xeon X5376 system which has not exhibited stability issues in the past, and seems fine for tests during today. I have linearised the stack and applied notes beside. ----[ Xen-4.3.1-xs82408-d x86_64 debug=y Not tainted ]---- CPU: 4 RIP: e008:[<ffff82c4c0235a92>] compat_create_bounce_frame+0x8/0xec RFLAGS: 0000000000010046 CONTEXT: hypervisor rax: 0000000000000061 rbx: ffff8300cfafa000 rcx: ffff82c4c02ffd80 rdx: ffff8300cfafa570 rsi: ffff83022eacfd00 rdi: ffff8300cfafa000 rbp: ffff83022eacfd60 rsp: ffff83022eacff08 r8: 0000000000000000 r9: 0000000000000000 r10: ffff83022ead32e8 r11: 00001ac42042804f r12: ffff8300cfafa000 r13: 0000000000000004 r14: ffff8300cfd3f000 r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0 cr3: 0000000228dde000 cr2: 00000000b74e4f10 ds: 007b es: 007b fs: 00d8 gs: 00e0 ss: 0000 cs: e008 Xen stack trace from rsp=ffff83022eacff08: 0000000000000093 | rflags from pushfq in ASSERT_INTERRUPTS_ENABLED ffff82c4c02358d8 | RA? compat/entry.S:123 in compat_test_all_events() 0000000000000001 | r15 ffff8300cfd3f000 | r14 0000000000000004 | r13 ffff8300cfafa000 | r12 00000000c1695ec0 | ebp 00000000deadbeef | ebx 0000000000000000 | r11 00000000deadbeef | r10 ffff8300cfafa060 | r9 0000000000000000 | r8 0000000000000000 | eax 00000000deadbeef | ecx 00000000ee8507a0 | edx 00000000c23a7000 | esi 0000000000000000 | edi 0002010000000000 | TRAP_syscall | TRAP_regs_dirty 00000000c10013a7 + (hypercall page) __HYPERCALL_sched_op 0000000000000061 | 0000000000000246 | Exception frame from ring1 kernel 00000000c1695eb0 | 0000000000000069 + 0000000000000000 | es 0000000000000000 | ds 0000000000000000 | fs 0000000000000000 | gs 0000000000000004 | cpu_info.processor_id ffff8300cfafa000 | cpu_info.current_vcpu 0000003d6e797180 | cpu_info.per_cpu_offset 0000000000000000 + Xen call trace: [<ffff82c4c0235a92>] compat_create_bounce_frame+0x8/0xec Xen has failed the ASSERT_INTERRUPTS_ENABLED check at the very top of compat_create_bounce_frame, which itself lacks a bugframe which is why it is not automatically recognised as an assertion. Following the code back using what I presume to be a return address as the penultimate word on the stack, the codeflow looks like: compat_test_all_events: ... sti leaq ... 5x mov ... call compat_create_bounce_frame jmp compat_test_all_events compat_create_bounce_frame: pushfq testb jnz ud2 What I presume has happened is that after 'sti', Xen has taken an interrupt, which has caused some form of corruption. Judging from the top word on the stack, rflags looks quite corrupt. Unfortunatly, this is all the available information. (The crash kernel failed to boot which is another issue I am looking into). For crashes like this, particularly when attempting to leave Xen context and return back to a guest, the information provided by the stack trace is quite lacking; The interesting information is what is what has just been popped off the stack (which I am hoping would have been another exception frame) Would it be sensible to have some indication that we are on the way out of Xen, so errors in situations like this can take a chance to print some of the recently popped stack values? I know it wont be terribly heavily used debugging, but think it is probably worth the effort for situations like this where there is simply not enough information to diagnose the issue. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |