[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] EFI Xen unstable crashes on Dell E6410 when calling efi_get_time.

On 22/10/14 11:22, Jan Beulich wrote:
>>>> On 22.10.14 at 11:45, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 22/10/14 01:29, Marcos E. Matsunaga wrote:
>>> I went out and got the serial cable. Attached is the full output.
>>> On 10/21/2014 05:06 PM, Marcos E. Matsunaga wrote:
>>>> Folks,
>>>> I am trying to boot Xen using efibootmgr on a Dell E6410 laptop with
>>>> 4GB RAM, running an Intel I5 dual core with VT and all the
>>>> virtualization options enabled.
>>>> It crashes almost immediately. I am working on getting the serial
>>>> console up so that I can get a more detailed stack.
>>>> A screenshot of the console is attached.
>>>> The xen.cfg file is:
>>>> [global]
>>>> default=xen
>>>> [xen]
>>>> options=console=vga,com1 com1=115200,8n1 dom0_max_vcpus=2 vga="qxl"
>>>> kernel=vmlinuz-3.8.13-48.el7uek.Other_EFI_v1.x86_64
>>>> root=UUID=917bfc7f-8d9c-4acf-a98a-a9f558daccf2  ro console=hvc0
>>>> enforcing=0 biosdevname=0 earlyprintk=xen nomodeset
>>>> ramdisk=initramfs-3.8.13-48.el7uek.Other_EFI_v1.x86_64.img
>>>> The codepath is "(gdb) x/20i get_cmos_time
>>>>    0xffff82d080188825 <get_cmos_time>:  push   %rbp
>>>>    0xffff82d080188826 <get_cmos_time+1>:        mov %rsp,%rbp
>>>>    0xffff82d080188829 <get_cmos_time+4>:        push   %r12
>>>>    0xffff82d08018882b <get_cmos_time+6>:        push   %rbx
>>>>    0xffff82d08018882c <get_cmos_time+7>:        cmpb
>>>> $0x0,0xb620d(%rip)        # 0xffff82d08023ea40 <efi_enabled>
>>>>    0xffff82d080188833 <get_cmos_time+14>:       je 0xffff82d080188843
>>>> <get_cmos_time+30>
>>>>    0xffff82d080188835 <get_cmos_time+16>:       callq
>>>> 0xffff82d080100069 <efi_get_time>"
>> Ok - there are two separate bugs here.
>> The first is that we call into the efi runtime via efi_rs->GetTime, and
>> a pagefault happens for the instruction at 0x00000000db25a33d for the
>> virtual address 0x00000000fed1f410
>> The memory map looks quite weird, but the faulting address is covered in
>> this range.
>> (XEN)  00000fed1c000-00000fed1ffff type=11 attr=8000000000000000
>> So I would expect it to be mapped into the EFI pagetables.
> Then you must have missed
> (XEN) Unknown cachability for MFNs 0xfed1c-0xfed1f
> which means no mapping got established (as we don't know what
> cachability attributes to give to it).
> This is a firmware bug.

I had indeed missed the secondary meaning of that message.

>> The second is that once the pagefault has happened, we trap back into
>> Xen and attempt to do a pagetable walk, falling over an assertion in
>> map_domain_page().
>> For EFI calls, we run on the efi pagetables, not the idle pagetables, so
>> I am not surprised that the assertion has failed.  I suspect that the
>> pagefault hander for hypervisor faults needs to become wise to the fact
>> that we may receive a fault when calling into the firmware.  As all the
>> efi pagetables are xenheap pages, there is nothing conceptually wrong
>> with using map_domain_page() to do the walk.
> I'm not sure it's worth taking care of this special case. But yes, if
> we really want to, extending the condition to also consider
> efi_l4_pgtable would seem the right thing to do.

I think being able to do a pagetable walk from an EFI fault would be
useful, even if only to aid debugging.  In this case, a non-debug build
would successfully perform the walk.

I have had a quick go, but it is rather hard to get the efi_l4_pgtable
symbol available to use in domain_page.c without some gross extern'ing. 
It would be a nice fix if anyone has sufficient tuits.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.