[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-4.3 - curious crash
>>> On 28.01.14 at 21:25, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > 0000000000000093 | rflags from pushfq in ASSERT_INTERRUPTS_ENABLED > ffff82c4c02358d8 | RA? compat/entry.S:123 in compat_test_all_events() > 0000000000000001 | r15 > ffff8300cfd3f000 | r14 > 0000000000000004 | r13 > ffff8300cfafa000 | r12 > 00000000c1695ec0 | ebp > 00000000deadbeef | ebx > 0000000000000000 | r11 > 00000000deadbeef | r10 > ffff8300cfafa060 | r9 > 0000000000000000 | r8 > 0000000000000000 | eax > 00000000deadbeef | ecx > 00000000ee8507a0 | edx > 00000000c23a7000 | esi > 0000000000000000 | edi > 0002010000000000 | TRAP_syscall | TRAP_regs_dirty > 00000000c10013a7 + (hypercall page) __HYPERCALL_sched_op > 0000000000000061 | > 0000000000000246 | Exception frame from ring1 kernel > 00000000c1695eb0 | > 0000000000000069 + > 0000000000000000 | es > 0000000000000000 | ds > 0000000000000000 | fs > 0000000000000000 | gs > 0000000000000004 | cpu_info.processor_id > ffff8300cfafa000 | cpu_info.current_vcpu > 0000003d6e797180 | cpu_info.per_cpu_offset > 0000000000000000 + > > Xen call trace: > [<ffff82c4c0235a92>] compat_create_bounce_frame+0x8/0xec > > > Xen has failed the ASSERT_INTERRUPTS_ENABLED check at the very top of > compat_create_bounce_frame, which itself lacks a bugframe which is why > it is not automatically recognised as an assertion. > > Following the code back using what I presume to be a return address as > the penultimate word on the stack, the codeflow looks like: > > compat_test_all_events: > ... > sti > leaq ... > 5x mov ... > call compat_create_bounce_frame > jmp compat_test_all_events > > compat_create_bounce_frame: > pushfq > testb > jnz > ud2 > > > What I presume has happened is that after 'sti', Xen has taken an > interrupt, which has caused some form of corruption. Judging from the > top word on the stack, rflags looks quite corrupt. Other that IF being clear, I see no other obvious corruption: CF, AF, and SF (and the reserved bit 1) are set, and all other flags are clear. Quite reasonable a state after the "cmpl $0xfe,%eax" (being the most recent instruction that affected the flags) it seems. An interrupt not properly restoring EFLAGS.IF (or actually one not properly restoring all of EFLAGS) would be very odd. About as odd as a cosmic radiation induced bit flip resulting in some other misbehavior. This hasn't been seen more than once I suppose? > For crashes like this, particularly when attempting to leave Xen context > and return back to a guest, the information provided by the stack trace > is quite lacking; The interesting information is what is what has just > been popped off the stack (which I am hoping would have been another > exception frame) > > Would it be sensible to have some indication that we are on the way out > of Xen, so errors in situations like this can take a chance to print > some of the recently popped stack values? I know it wont be terribly > heavily used debugging, but think it is probably worth the effort for > situations like this where there is simply not enough information to > diagnose the issue. While I realize that in a case like this seeing stack contents below the stack pointer may be useful (but there's no guarantee it would be), I don't think it is reasonable to get the code prepared for all kinds of extremely unlikely scenarios to be debuggable. If the issue here is reproducible, I'm sure you'll be able to instrument the code such that you can get further information out of the system (and that's not necessarily just stack contents - presumably you'd want to track other state or state changes in some kind of static buffer, which you'd then also want to dump out at the point of the crash). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |