Re: [Xen-devel] 3.1/2 live migration panic

Well that's a lot saner even without being a debug build. Possibly Tim has
some insight into how this can happen... I expect the 'recursive shadow
fault' is simply a result of the fault in shadow_set_l1e() causing an
unexpected re-entry into shadow code. It'd be interesting to know which
invocation of shadow_set_l1e() is on the backtrace. That might be easier to
work out if you can repro the crash with a debug build of Xen.
Alternatively, since there is obviously a very bogus nearly-NULL pointer
involved, perhaps you could add some tracing to pick up on that? Possibly
the sl1e argument to shadow_set_l1e() is the thing that is bogus here.

 -- Keir

On 16/1/08 22:37, "John Levon" <levon@xxxxxxxxxxxxxxxxx> wrote:

> On Wed, Jan 16, 2008 at 09:43:41PM +0000, Keir Fraser wrote:
>> If you have a debug build of Xen then the backtrace should be trustworthy.
>> Are there addresses in the backtrace that don't look to be within Xen text?
> Here's what I got without the panic patch (sigh):
> (XEN) sh error: sh_page_fault__shadow_4_guest_4(): Recursive shadow fault:
> lock was taken by sh_page_fault__shadow_4_guest_4
> (XEN) ----[ Xen-3.1.2  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0
> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 800000012521c067   rcx: 0000000000132b04
> (XEN) rdx: 800000012521c067   rsi: 00000000000000b1   rdi: ffff8300e2ed2080
> (XEN) rbp: ffff8300e2e0fc08   rsp: ffff8300e2e0fbc8   r8:  0000000000000006
> (XEN) r9:  0000000000000006   r10: 0000000132b05118   r11: 0000000132b07ff0
> (XEN) r12: ffff8300e2ed2080   r13: 00000000000000b1   r14: 0000000000132b04
> (XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000006f0
> (XEN) cr3: 0000000132b07000   cr2: 00000000000000b1
> (XEN) ds: 004b   es: 004b   fs: 0000   gs: 01c3   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff8300e2e0fbc8:
> (XEN)    ffff8300e2e0fc08 0000000080167b02 800000012521c067
> (XEN)    ffff8300e2e0ff28 ffff8300e2ed2080 ffff8300e2fb6080
> (XEN)    ffff828c80221b20 0000000000000001 ffff8300e2e0fe18
> (XEN)    ffff828c8016af61 0000000000000000 ffffff0003fd3ac0
> (XEN)    0000000000000000 ffff828c80221b20 ffff8300e2e0fd98
> (XEN)    ffff828c8010d757 ffff830184aeb000 0000000000000008
> (XEN)    ffff8300e2fb6080 ffff828c801c52b8 0000000000132b07
> (XEN)    ffff81c0ffc00118 00000000000000b1 ffff8300e2e0fcf0
> (XEN)    0000000000000008 000000000012521c 00000006e2e0fd68
> (XEN)    ffff8300e2e0fe78 ffffff000475d848 ffff8300e2e06080
> (XEN)    ffff8300e2e0fcf8 800000012521c067 0000000132b04067
> (XEN)    0000000132b05067 0000000132b06067 0000000000132b06
> (XEN)    0000000000132b05 0000000000132b04 ffff8300e2e0fd18
> (XEN)    0000000000000082 0000000000003000 ffff8300e2e06080
> (XEN)    ffff8300e2e0fd28 ffff828c801355b2 ffff8300e2e0fe78
> (XEN)    ffff828c801288db ffff828c801c8100 0000005878a902d7
> (XEN)    ffff8300e2ed2080 ffff8300e2e06080 0000000000000086
> (XEN)    0000000000003000 ffff8300e2e0ff28 ffff8300e2e06080
> (XEN)    ffff8300e2ed2080 ffff8300e2e0fdc0 ffff828c801252d7
> (XEN)    820000000000efff ffffff000475d848 ffff8140a0502ff0
> (XEN) Xen call trace:
> (XEN)    [<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0
> (XEN)    [<ffff828c8016af61>] sh_page_fault__shadow_4_guest_4+0xb61/0x10b0
> (XEN)    [<ffff828c8013b1c2>] do_page_fault+0x1f2/0x500
> (XEN)    [<ffff828c8017a495>] handle_exception_saved+0x2d/0x6b
> (XEN)    
> (XEN) Pagetable walk from 00000000000000b1:
> (XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 00000000000000b1
> (XEN) ****************************************
> (XEN) 
> (XEN) Reboot in five seconds...

