[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)



Jan, Andrew, Ian,

pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.

On 05/11/2013 09:53, Ian Campbell wrote:
> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
> to subscribe to post).
Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.

On 04/11/2013 20:00, Andrew Cooper wrote:
> Which version of Xen were these images saved on?
[Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0. 

> Are you expecting to be using nested-virt? (It is still very definitely experimental)
[Jeff] Not using nested-virt.

On 05/11/2013 10:04, Jan Beulich wrote:
On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@xxxxxxxxx> wrote:
See
http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- 
1.html
---
I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
DOM0 is Centos 6.3 based with linux kernel 3.10.16.
In my configuration all of the windows HVMs are running having been
restored from xl save.
VM's are destroyed or restored in an on-demand fashion. After some time XEN
will experience a fatal page fault while restoring one of the windows HVM
subjects. This does not happen very often, perhaps once in a 16 to 48 hour
period.
The stack trace from xen follows. Thanks in advance for any help.

(XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
(XEN) CPU: 52
(XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
Zapping addresses (here and below in the stack trace) is never
helpful when someone asks for help with a crash. Also, in order
to not just guess, the matching xen-syms or xen.efi should be
made available or pointed to.

(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
(XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
(XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
(XEN) cr3: 000000211bee5000 cr2: ffff810000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8310333e7cd8:
(XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
(XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
(XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
(XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
(XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
(XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
(XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
(XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
(XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
(XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
(XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
(XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
(XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
(XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN) [] domain_page_map_to_mfn+0x86/0xc0
(XEN) [] nvmx_handle_vmlaunch+0x49/0x160
(XEN) [] __update_vcpu_system_time+0x240/0x310
(XEN) [] vmx_vmexit_handler+0xb58/0x18c0
(XEN) [] pt_restore_timer+0xa8/0xc0
(XEN) [] hvm_io_assist+0xef/0x120
(XEN) [] hvm_do_resume+0x195/0x1c0
(XEN) [] vmx_do_resume+0x148/0x210
(XEN) [] context_switch+0x1bc/0xfc0
(XEN) [] schedule+0x254/0x5f0
(XEN) [] pt_update_irq+0x256/0x2b0
(XEN) [] timer_softirq_action+0x168/0x210
(XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
(XEN) [] nvmx_switch_guest+0x54/0x1560
(XEN) [] vmx_intr_assist+0x6c/0x490
(XEN) [] vmx_vmenter_helper+0x88/0x160
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] vmx_asm_do_vmentry+0/0xed
(XEN)
(XEN) Pagetable walk from ffff810000000000:
(XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
(XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
This makes me suspect that domain_page_map_to_mfn() gets a
NULL pointer passed here. As said above, this is only guesswork
at this point, and as Ian already pointed out, directing the
reporter to xen-devel would seem to be the right thing to do
here anyway.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.