[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...



On 30/08/13 22:22, Mukesh Rathor wrote:
On Fri, 30 Aug 2013 18:21:52 +0100
George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:

On 30/08/13 12:02, George Dunlap wrote:
On 30/08/13 01:25, Mukesh Rathor wrote:
On Thu, 29 Aug 2013 17:28:57 +0100
George Dunlap <george.dunlap@xxxxxxxxxxxxx> wrote:

On 28/08/13 01:37, Mukesh Rathor wrote:
On Tue, 27 Aug 2013 18:05:00 +0100
George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:

On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
<mukesh.rathor@xxxxxxxxxx> wrote:
On Fri, 23 Aug 2013 13:05:08 +0100
......
And if I set it to only one vcpu, it gets stuck in an EPT violation
loop:

(XEN) PVH currently does not support tsc emulation. Setting
timer_mode = native
(XEN) PVH currently does not support tsc emulation. Setting
timer_mode = native
(XEN) grant_table.c:577:d0 remote grant table not yet set up[
283.823609] device vif3.0 entered promiscuous mode^M
[  283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M
mapping kernel into physical memory
about to get started...
<G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df90
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df88
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df88
I took a xentrace of this, and it looks like what happens is this:

]  9.403782967 --------x------- d3v0 vmexit exit_reason VMCALL eip
ffffffff81001405
]  9.403784176 --------x------- d3v0 vmentry cycles 2903
]  9.403792751 --------x------- d3v0 vmexit exit_reason VMCALL eip
ffffffff81001305
]  9.403794945 --------x------- d3v0 vmentry cycles 5263
]  9.404782907 --------x------- d3v0 vmexit exit_reason
EXTERNAL_INTERRUPT eip ffffffff817c6ff0
     9.404782907 --------x------- d3v0 intr vec THERMAL_APIC(fa)
     9.404782907 --------x------- d3v0 intr_window vec 243 src
5(vector) intr #
]  9.404785283 --------x------- d3v0 vmentry cycles 5703
]  9.406630481 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI
eip ffffffff817ca5a5
     9.406630481 --------x------- inj_exc trap Invalid Op ec ffffffff
     9.406630481 --------x------- d3v0 intr_window vec 243 src
5(vector) intr 6
]  9.406634957 --------x------- d3v0 vmentry cycles 10741 !
hvm_generic_postprocess: Strange, exit 0(EXCEPTION_NMI) missing a
handler ]  9.406636249 --------x------- d3v0 vmexit exit_reason
EXCEPTION_NMI eip ffffffff817ca655
     9.406636249 --------x------- inj_exc trap Invalid Op ec ffffffff
     9.406636249 --------x------- d3v0 intr_window vec 243 src
5(vector) intr 6
]  9.406637659 --------x------- d3v0 vmentry cycles 3382
]  9.406638483 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI
eip ffffffff817ca655
     9.406638483 --------x------- inj_exc trap Invalid Op ec ffffffff
     9.406638483 --------x------- d3v0 intr_window vec 243 src
5(vector) intr 6
]  9.406639793 --------x------- d3v0 vmentry cycles 3143


Note the "Invalid Op" that's being delivered, at address
ffffffff817ca5a5.  Here is a disassembly of that region:

ffffffff817ca5a0 <do_page_fault>:
ffffffff817ca5a0:       55                      push   %rbp
ffffffff817ca5a1:       48 89 e5                mov    %rsp,%rbp
ffffffff817ca5a4:       e8 47 fb ff ff          callq
ffffffff817ca0f0 <__do_page_fault>
ffffffff817ca5a9:       5d                      pop    %rbp
ffffffff817ca5aa:       c3                      retq
ffffffff817ca5ab:       90                      nop

If you'll notice, ffffffff817ca5a5 is actually in the middle of an
instruction; it's no surprise that it's an invalid one.  The next two
eips for illegal instructions are at ffffffff817ca655:

ffffffff817ca650 <notify_die>:
ffffffff817ca650:       55                      push   %rbp
ffffffff817ca651:       48 89 e5                mov    %rsp,%rbp
ffffffff817ca654:       48 83 ec 20             sub    $0x20,%rsp
ffffffff817ca658:       48 89 55 e0             mov %rdx,-0x20(%rbp)
ffffffff817ca65c:       48 8d 55 e0             lea -0x20(%rbp),%rdx
ffffffff817ca660:       48 89 75 e8             mov %rsi,-0x18(%rbp)
ffffffff817ca664:       89 fe                   mov    %edi,%esi
ffffffff817ca666:       48 c7 c7 10 55 e4 81    mov
$0xffffffff81e45510,%rdi ffffffff817ca66d:       48 89 4d
f0             mov %rcx,-0x10(%rbp) ffffffff817ca671:       44 89 45
f8             mov %r8d,-0x8(%rbp) ffffffff817ca675:       44 89 4d
fc             mov %r9d,-0x4(%rbp) ffffffff817ca679:       e8 b2 ff
ff ff          callq ffffffff817ca630 <atomic_notifier_call_chain>
ffffffff817ca67e:       c9                      leaveq
ffffffff817ca67f:       c3                      retq

Again, in the middle of an instruction; and again 5 bytes after the
beginning of a function.

It looks, from the rest of it, like it keeps looping on illegal op
exits in the fault handlers until it runs out of stack space and hits
an EPT fault.

The first question to ask, of course, is whether the disassembly is
valid; I think it is, because I looked up the RIP of 5-6 vmexits
before this one, and they seem to match (e.g., CPUID exits are at an
RIP that the disassembly says is a cpuid instruction).

Any ideas what might be causing it to end up in the middle of
instructions while handling exits?

I should repeat, this is your tree + the tools patch, without any
changes.  (My port actually does the same thing, which is reassuring
I guess...)
The RIP totally doesn't makes sense, and 90% of the time, I've found
make mrproper to completely clean it up and starting again, will give
you better info.

Just for good measure, I did a "git clean -ffdx", which gets rid of every file in the repo that git doesn't recognize, and re-built. Same thing: Invalid instruction traps, the first one being delivered in the middle of do_page_fault().

One thing I did forget to mention: this is with only one vcpu. With 4 vcpus, it crashes much sooner, but with no useful output.

I think it might be better to have one tree. So, konrad has refreshed
the tree pvh.v9, I'm taking that and adding whatever patches, make it
work, and then put it externally. So you and I will then both be looking
at exact same linux. Monday is holiday here, so most likely the external
tree would be Tues/Wed, gotta go thru admin hoops here to set it up.

Sounds good -- it might be helpful to have your kernel config as well.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.