[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?



On 17/09/2015 21:23, Andy Lutomirski wrote:
> On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
>> On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
>>> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
>>>>> On 09/16/2015 04:07 PM, M A Young wrote:
>>>>>> On Wed, 16 Sep 2015, Cole Robinson wrote:
>>>>>>
>>>>>>> Unfortunately I couldn't get anything else extra out of xen using any 
>>>>>>> of these
>>>>>>> options or the ones Major recommended... in fact I couldn't get 
>>>>>>> anything to
>>>>>>> the serial console at all. console=con1 would seem to redirect messages 
>>>>>>> since
>>>>>>> they wouldn't show up on the graphical display, but nothing went to the 
>>>>>>> serial
>>>>>>> log. Maybe I'm missing something...
>>>>>> That should be console=com1 so you have a typo either in this message or
>>>>>> in your tests.
>>>>>>
>>>>> Yeah that was it :/ So here's the crash output use -cpu host:
>>>>>
>>>>> - Cole
>>>>>
>>> <snip>
>>>
>>>>> about to get started...
>>>>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap 
>>>>> [#13] on
>>>>> VCPU 0 [ec=0000]
>>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>>>>> create_bounce_frame+0x12b/0x13a
>>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>>>> (XEN) ----[ Xen-4.5.1  x86_64  debug=n  Not tainted ]----
>>>>> (XEN) CPU:    0
>>>>> (XEN) RIP:    e033:[<ffffffff810032b0>]
>>>> That is the Linux kernel EIP. Can you figure out what is at 
>>>> ffffffff810032b0 ?
>>>>
>>>> gdb vmlinux and then
>>>> x/20i 0xffffffff810032b0
>>>>
>>>> can help with that.
>>>>
>>> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
>>>
>>> about to get started...
>>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] 
>>> on
>>> VCPU 0 [ec=0000]
> What exactly does this mean?

This means that there was  #GP fault originating from dom0 context, but
dom0 has not yet registered a #GP handler with Xen.  (I already have a
patch pending to correct the wording of that error message.)

Would be a double fault on native.

>
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>>> create_bounce_frame+0x12b/0x13a
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.5.1  x86_64  debug=n  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e033:[<ffffffff810031f0>]
>>> (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
>>> (XEN) rax: 0000000000000015   rbx: ffffffff81c03e1c   rcx: 00000000c0010112
>>> (XEN) rdx: 0000000000000001   rsi: ffffffff81c03e1c   rdi: 00000000c0010112
>>> (XEN) rbp: ffffffff81c03df8   rsp: ffffffff81c03da0   r8:  ffffffff81c03e28
>>> (XEN) r9:  ffffffff81c03e2c   r10: 0000000000000000   r11: 00000000ffffffff
>>> (XEN) r12: ffffffff81d25a60   r13: 0000000004000000   r14: 0000000000000000
>>> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000406f0
>>> (XEN) cr3: 0000000075c0b000   cr2: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
>>> (XEN) Guest stack trace from rsp=ffffffff81c03da0:
>>> (XEN)    00000000c0010112 00000000ffffffff 0000000000000000 ffffffff810031f0
>>> (XEN)    000000010000e030 0000000000010082 ffffffff81c03de0 000000000000e02b
>>> (XEN)    0000000000000000 000000000000000c ffffffff81c03e1c ffffffff81c03e48
>>> (XEN)    ffffffff8102a7a4 ffffffff81c03e48 ffffffff8102aa3b ffffffff81c03e48
>>> (XEN)    cf1fa5f5e026f464 0000000001000000 ffffffff81c03ef8 0000000004000000
>>> (XEN)    0000000000000000 ffffffff81c03e58 ffffffff81d5d142 ffffffff81c03ee8
>>> (XEN)    ffffffff81d58b56 0000000000000000 0000000000000000 ffffffff81c03e88
>>> (XEN)    ffffffff810f8a39 ffffffff81c03ee8 ffffffff81798b13 ffffffff00000010
>>> (XEN)    ffffffff81c03ef8 ffffffff81c03eb8 cf1fa5f5e026f464 ffffffff81f1de9c
>>> (XEN)    ffffffffffffffff 0000000000000000 ffffffff81df7920 0000000000000000
>>> (XEN)    0000000000000000 ffffffff81c03f28 ffffffff81d51c74 cf1fa5f5e026f464
>>> (XEN)    0000000000000000 ffffffff81c03f60 ffffffff81c03f5c 0000000000000000
>>> (XEN)    0000000000000000 ffffffff81c03f38 ffffffff81d51339 ffffffff81c03ff8
>>> (XEN)    ffffffff81d548b1 0000000000000000 00600f1200000000 0000000100000800
>>> (XEN)    0300000100000032 0000000000000005 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>
>>>
>>> gdb output:
>>>
>>> (gdb) x/20i 0xffffffff810031f0
>>>    0xffffffff810031f0 <xen_read_msr_safe+16>: rdmsr
>> Fantastic! So we have some rdmsr that makes KVM inject an
>> GP.
> What's the scenario?  Is this Xen on KVM?

I believe from the thread that this is a Xen/dom0 combo running as a KVM
guest.

>
> Why didn't the guest print anything?

Lack of earlyprintk=xen on the dom0 command line.  (IMO this really
should be the default when a PVOPs detects that it is running under Xen)

>
> Is the issue here that the guest died due to failure to handle an
> RDMSR failure or did the *hypervisor* die?

The guest suffered a GP fault which it couldn't handle.  Therefore Xen
crashed the domain.

When dom0 crashes, Xen goes down too.

>
> It looks like null_trap_bounce is returning true, which suggests that
> the failure is happening before the guest sets up exception handling.

I concur.

>
>> Looking at the stack you have some other values:
>> ffffffff81c03de0, ffffffff81c03e1c .. they should correspond
>> to other functions calling this one. If you do 'nm --defined vmlinux | grep 
>> ffffffff81c03e1'
>> that should give an idea where they are. Or use 'gdb'.
>>
>> That will give us an stack - and we can find what type of MSR
>> this is. Oh wait, it is on the registers: 00000000c0010112
>>
>> Ok, so where in the code is that MSR ah, that looks to be:
>>  #define MSR_K8_TSEG_ADDR                0xc0010112
>>
>> which is called at bsp_init_amd.
>>
>> I think the problem here is that we are calling the
>> 'safe' variant of MSR but we still get an injected #GP and
>> don't expect that.
>>
>> I am not really sure what the expected outcome should be here.
>>
>> CC-ing xen-devel, KVM folks, and Andy who has been looking
>> in mucking around in the _safe* pvops.
> It's too early of a failure, I think.
>
> Cc: Borislav.  Is TSEG guaranteed to exist?  Can we defer that until
> we have exception handling working?  Do we need to rig up exception
> handling so that it works earlier (e.g. in early_trap_init, which is
> presumably early enough)?  Or is this just a KVM and/or Xen bug.

It would certainly help to move the exception setup as early as possible.

From a Xen PV guests point of view, the kernel is already executing on
working pagetables and flat GDT when it starts.  A set_trap_table
hypercall (equivalent of `lidt`) ought to be the second action,
following the stack switch.

This appears not to be the case, and the load_idt() is deferred until
native cpu_init().

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.