[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
On 30/08/13 12:02, George Dunlap wrote: On 30/08/13 01:25, Mukesh Rathor wrote:On Thu, 29 Aug 2013 17:28:57 +0100 George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:On 28/08/13 01:37, Mukesh Rathor wrote:On Tue, 27 Aug 2013 18:05:00 +0100 George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote:On Fri, 23 Aug 2013 13:05:08 +0100 "Jan Beulich" <JBeulich@xxxxxxxx> wrote:On 23.08.13 at 13:15, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:On 23.08.13 at 03:18, Mukesh Rathor.......Fine with me, but perhaps Mukesh won't be that happy...It's OK. I'd like this to be merged in asap so I and others can working on the FIXME's right away...I'm still waiting on the toolstack changes that are needed to actually put it in PVH mode before I can test it.Also, for V11 you'd need following patch for linux:OK, so I've tried this with your Xen and Linux branches (i.e., without any of my changes). Dom0 boots, and the kernel boots as PV, but crashes as PVH: (XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) grant_table.c:577:d0 remote grant table not yet set up[95984.867796] device vif19.0 entered promiscuous mode [95984.882699] ADDRCONF(NETDEV_UP): vif19.0: link is not ready mapping kernel into physical memory about to get started... <G><2>irq.c:375: Dom19 callback via changed to Direct Vector 0xf3 (XEN) PVH: Unhandled trap:0x2 RIP:0xffffffff8101a503 (XEN) PVH: [15] exit_reas:0 0 qual:0 0 cr0:0x00000080000039 (XEN) PVH: RIP:0xffffffff8101a503 RSP:0xffff88003e1b5dd8 EFLGS:0x12 CR3:0x1c0c000 (XEN) domain_crash called from pvh.c:487 (XEN) Domain 19 (vcpu#0) crashed on cpu#15: (XEN) ----[ Xen-4.4-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 15 (XEN) RIP: 0000:[<ffffffff8101a503>] (XEN) RFLAGS: 0000000000000012 CONTEXT: hvm guest (XEN) rax: ffffffffff493c7c rbx: ffffffff81dc0d24 rcx: 00000000000000f0 (XEN) rdx: 0000000000000001 rsi: 0000000000000000 rdi: 0000000000000200 (XEN) rbp: ffff88003e1b5e18 rsp: ffff88003e1b5dd8 r8: 0000000000000000 (XEN) r9: 0000000000000063 r10: 0720072007200720 r11: 0720072007200720 (XEN) r12: ffffffff81dc5000 r13: ffff88003e005240 r14: ffffffff817d2b69 (XEN) r15: ffffffff81000000 cr0: 0000000080000039 cr4: 0000000000002660 (XEN) cr3: 0000000001c0c000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: 0000 (XEN) Guest stack trace from rsp=ffff88003e1b5dd8: (XEN) Fault while accessing guest memory. [95985.368360] device vif19.0 left promiscuous modeYou prob have nmi watchdog running... you can just disable it for now. The NMI is handled in the caller, so pvh handler needs to just ignore it. I'll make a note of that.Now with multiple vcpus, the guest crashes without any error message:(XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) grant_table.c:577:d0 remote grant table not yet set up[ 158.203543] device vif2.0 entered promiscuous mode^M[ 158.222642] ADDRCONF(NETDEV_UP): vif2.0: link is not ready^M mapping kernel into physical memory about to get started... <G><2>irq.c:375: Dom2 callback via changed to Direct Vector 0xf3 [ 158.620609] device vif2.0 left promiscuous mode^M And if I set it to only one vcpu, it gets stuck in an EPT violation loop:(XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) PVH currently does not support tsc emulation. Setting timer_mode = native (XEN) grant_table.c:577:d0 remote grant table not yet set up[ 283.823609] device vif3.0 entered promiscuous mode^M[ 283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M mapping kernel into physical memory about to get started... <G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3 epte 1c000008295c6007 (XEN) p2m-ept.c:657:d3 epte 1c000008295c5007 (XEN) p2m-ept.c:657:d3 epte 1c00000434c38007 (XEN) p2m-ept.c:657:d3 epte 0 (XEN) --- GLA 0xffff88003e22df90(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3 epte 1c000008295c6007 (XEN) p2m-ept.c:657:d3 epte 1c000008295c5007 (XEN) p2m-ept.c:657:d3 epte 1c00000434c38007 (XEN) p2m-ept.c:657:d3 epte 0 (XEN) --- GLA 0xffff88003e22df88(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3 epte 1c000008295c6007 (XEN) p2m-ept.c:657:d3 epte 1c000008295c5007 (XEN) p2m-ept.c:657:d3 epte 1c00000434c38007 (XEN) p2m-ept.c:657:d3 epte 0 (XEN) --- GLA 0xffff88003e22df88 I took a xentrace of this, and it looks like what happens is this:] 9.403782967 --------x------- d3v0 vmexit exit_reason VMCALL eip ffffffff81001405 ] 9.403784176 --------x------- d3v0 vmentry cycles 2903] 9.403792751 --------x------- d3v0 vmexit exit_reason VMCALL eip ffffffff81001305 ] 9.403794945 --------x------- d3v0 vmentry cycles 5263] 9.404782907 --------x------- d3v0 vmexit exit_reason EXTERNAL_INTERRUPT eip ffffffff817c6ff0 9.404782907 --------x------- d3v0 intr vec THERMAL_APIC(fa)9.404782907 --------x------- d3v0 intr_window vec 243 src 5(vector) intr # ] 9.404785283 --------x------- d3v0 vmentry cycles 5703] 9.406630481 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI eip ffffffff817ca5a5 9.406630481 --------x------- inj_exc trap Invalid Op ec ffffffff9.406630481 --------x------- d3v0 intr_window vec 243 src 5(vector) intr 6 ] 9.406634957 --------x------- d3v0 vmentry cycles 10741 ! hvm_generic_postprocess: Strange, exit 0(EXCEPTION_NMI) missing a handler] 9.406636249 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI eip ffffffff817ca655 9.406636249 --------x------- inj_exc trap Invalid Op ec ffffffff9.406636249 --------x------- d3v0 intr_window vec 243 src 5(vector) intr 6 ] 9.406637659 --------x------- d3v0 vmentry cycles 3382] 9.406638483 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI eip ffffffff817ca655 9.406638483 --------x------- inj_exc trap Invalid Op ec ffffffff9.406638483 --------x------- d3v0 intr_window vec 243 src 5(vector) intr 6 ] 9.406639793 --------x------- d3v0 vmentry cycles 3143Note the "Invalid Op" that's being delivered, at address ffffffff817ca5a5. Here is a disassembly of that region: ffffffff817ca5a0 <do_page_fault>: ffffffff817ca5a0: 55 push %rbp ffffffff817ca5a1: 48 89 e5 mov %rsp,%rbpffffffff817ca5a4: e8 47 fb ff ff callq ffffffff817ca0f0 <__do_page_fault> ffffffff817ca5a9: 5d pop %rbp ffffffff817ca5aa: c3 retq ffffffff817ca5ab: 90 nopIf you'll notice, ffffffff817ca5a5 is actually in the middle of an instruction; it's no surprise that it's an invalid one. The next two eips for illegal instructions are at ffffffff817ca655: ffffffff817ca650 <notify_die>: ffffffff817ca650: 55 push %rbp ffffffff817ca651: 48 89 e5 mov %rsp,%rbp ffffffff817ca654: 48 83 ec 20 sub $0x20,%rsp ffffffff817ca658: 48 89 55 e0 mov %rdx,-0x20(%rbp) ffffffff817ca65c: 48 8d 55 e0 lea -0x20(%rbp),%rdx ffffffff817ca660: 48 89 75 e8 mov %rsi,-0x18(%rbp) ffffffff817ca664: 89 fe mov %edi,%esi ffffffff817ca666: 48 c7 c7 10 55 e4 81 mov $0xffffffff81e45510,%rdi ffffffff817ca66d: 48 89 4d f0 mov %rcx,-0x10(%rbp) ffffffff817ca671: 44 89 45 f8 mov %r8d,-0x8(%rbp) ffffffff817ca675: 44 89 4d fc mov %r9d,-0x4(%rbp)ffffffff817ca679: e8 b2 ff ff ff callq ffffffff817ca630 <atomic_notifier_call_chain> ffffffff817ca67e: c9 leaveq ffffffff817ca67f: c3 retqAgain, in the middle of an instruction; and again 5 bytes after the beginning of a function. It looks, from the rest of it, like it keeps looping on illegal op exits in the fault handlers until it runs out of stack space and hits an EPT fault. The first question to ask, of course, is whether the disassembly is valid; I think it is, because I looked up the RIP of 5-6 vmexits before this one, and they seem to match (e.g., CPUID exits are at an RIP that the disassembly says is a cpuid instruction). Any ideas what might be causing it to end up in the middle of instructions while handling exits? I should repeat, this is your tree + the tools patch, without any changes. (My port actually does the same thing, which is reassuring I guess...) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |