[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 07/13] x86: add multiboot2 protocol support for EFI platforms



On 1/12/17 6:50 AM, Daniel Kiper wrote:
> On Wed, Jan 11, 2017 at 02:20:15PM -0600, Doug Goldstein wrote:
>> On 1/11/17 1:47 PM, Daniel Kiper wrote:
>>> On Tue, Jan 10, 2017 at 02:51:27PM -0600, Doug Goldstein wrote:
>>>> On 1/9/17 7:37 PM, Doug Goldstein wrote:
>>>>> On 12/5/16 4:25 PM, Daniel Kiper wrote:
>>>>
>>>>>> diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
>>>>>> index 62c010e..dc857d8 100644
>>>>>> --- a/xen/arch/x86/efi/efi-boot.h
>>>>>> +++ b/xen/arch/x86/efi/efi-boot.h
>>>>>> @@ -146,6 +146,8 @@ static void __init 
>>>>>> efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
>>>>>>  {
>>>>>>      struct e820entry *e;
>>>>>>      unsigned int i;
>>>>>> +    /* Check for extra mem for mbi data if Xen is loaded via multiboot2 
>>>>>> protocol. */
>>>>>> +    UINTN extra_mem = efi_enabled(EFI_LOADER) ? 0 : (64 << 10);
>>>>>
>>>>> Just wondering where the constant came from? And if there should be a
>>>>> little bit of information about it. To me its just weird to shift 64.
>>>>
>>>> Its the size of the stack used in the assembly code.
>>>
>>> No, it is trampoline region size.
>>
>> trampoline + stack in head.S We take the address where we're going to
>> copy the trampoline and set the stack to 0x10000 past it.
> 
> I suppose that you think about this:
> 
>         /* Switch to low-memory stack.  */
>         mov     sym_fs(trampoline_phys),%edi
>         lea     0x10000(%edi),%esp
> 
> However, trampoline region size is (should be) 64 KiB. No way. Please
> look below for more details.

The trampoline + stack are 64kb together. The stack grows down and the
trampoline grows up. The stack starts at 64kb past the start of the
trampoline. %edi is the start of the trampoline.

> 
>>>>>>      /* Populate E820 table and check trampoline area availability. */
>>>>>>      e = e820map - 1;
>>>>>> @@ -168,7 +170,8 @@ static void __init 
>>>>>> efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
>>>>>>              /* fall through */
>>>>>>          case EfiConventionalMemory:
>>>>>>              if ( !trampoline_phys && desc->PhysicalStart + len <= 
>>>>>> 0x100000 &&
>>>>>> -                 len >= cfg.size && desc->PhysicalStart + len > 
>>>>>> cfg.addr )
>>>>>> +                 len >= cfg.size + extra_mem &&
>>>>>> +                 desc->PhysicalStart + len > cfg.addr )
>>>>>>                  cfg.addr = (desc->PhysicalStart + len - cfg.size) & 
>>>>>> PAGE_MASK;
>>>>>
>>>>> So this is where the current series blows up and fails on real hardware.
>>>>
>>>> Honestly this was my misunderstanding and this shouldn't ever be used to
>>>> get memory for the trampoline. This also has the bug in it that it needs
>>>> to be:
>>>>
>>>> ASSERT(cfg.size > 0);
>>>> cfg.addr = (desc->PhysicalStart + len - (cfg.size + extra_mem) & PAGE_MASK;
>>>
>>> As I said earlier. This extra_mem stuff is (maybe) wrong and should be fixed
>>> in one way or another. Hmmm... It looks OK. I will double check it because
>>> I do not looked at this code long time and maybe I am missing something.
>>
>> cfg.size needs to be the size of the trampolines + stack.
> 
> It looks that during some code rearrangement I moved one instruction too
> much to trampoline_bios_setup. So, I can agree that right now cfg.size
> should be properly initialized. Though it should be cfg.size = 64 << 10.
> Then extra_mem should be dropped.

That's fine as long as its clear that 64kb is for the trampoline + the
stack.

> 
>>>>> No where in the EFI + MB2 code path is cfg.size ever initialized. Its
>>>>> only initialized in the straight EFI case. The result is that cfg.addr
>>>>> is set to the section immediately following this. Took a bit to
>>>>> trackdown because I checked for memory overlaps with where Xen was
>>>>> loaded and where it was relocated to but forgot to check for overlaps
>>>>> with the trampoline code. This is the address where the trampoline jumps
>>>>> are copied.
>>>>>
>>>>> Personally I'd like to see an ASSERT added or the code swizzled around
>>>>> in such a way that its not possible to get into a bad state. But this is
>>>>> probably another patch series.
>>>>>
>>>>>>              /* fall through */
>>>>>>          case EfiLoaderCode:
>>>>>> @@ -210,12 +213,14 @@ static void *__init 
>>>>>> efi_arch_allocate_mmap_buffer(UINTN map_size)
>>>>>>
>>>>>>  static void __init efi_arch_pre_exit_boot(void)
>>>>>>  {
>>>>>> -    if ( !trampoline_phys )
>>>>>> -    {
>>>>>> -        if ( !cfg.addr )
>>>>>> -            blexit(L"No memory for trampoline");
>>>>>> +    if ( trampoline_phys )
>>>>>> +        return;
>>>>>> +
>>>>>> +    if ( !cfg.addr )
>>>>>> +        blexit(L"No memory for trampoline");
>>>>>> +
>>>>>> +    if ( efi_enabled(EFI_LOADER) )
>>>>>>          relocate_trampoline(cfg.addr);
>>>>
>>>> Why is this call even here anymore? Its called in
>>>> efi_arch_memory_setup() already. If it was unable to allocate memory
>>>> under the 1mb region its just going to trample over ANY conventional
>>>> memory region that might be in use.
>>>
>>> Trampoline is relocated in xen/arch/x86/boot/head.S.
>>
>> For the MB2/MB case. But for the straight EFI case its called in
>> efi_arch_memory_setup() and then you've added the wrapper of
> 
> That is true.
> 
>> efi_enabled(EFI_LOADER) which in theory would have it called again in
>> the straight EFI case if trampoline_phys isn't set and cfg.addr is set.
> 
> Why "in theory"?

ok drop the words "in theory" and the statement is fine.

> 
>>>>>> -    }
>>>>>>  }
>>>>>>
>>>>>>  static void __init noreturn efi_arch_post_exit_boot(void)
>>>>>> @@ -653,6 +658,43 @@ static bool_t __init 
>>>>>> efi_arch_use_config_file(EFI_SYSTEM_TABLE *SystemTable)
>>>>>>
>>>>>>  static void efi_arch_flush_dcache_area(const void *vaddr, UINTN size) { 
>>>>>> }
>>>>>>
>>>>>> +paddr_t __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
>>>>>> *SystemTable)
>>>>>> +{
>>>>>> +    EFI_GRAPHICS_OUTPUT_PROTOCOL *gop;
>>>>>> +    UINTN cols, gop_mode = ~0, rows;
>>>>>> +
>>>>>> +    __set_bit(EFI_BOOT, &efi_flags);
>>>>>> +    __set_bit(EFI_RS, &efi_flags);
>>>>>> +
>>>>>> +    efi_init(ImageHandle, SystemTable);
>>>>>> +
>>>>>> +    efi_console_set_mode();
>>>>>> +
>>>>>> +    if ( StdOut->QueryMode(StdOut, StdOut->Mode->Mode,
>>>>>> +                           &cols, &rows) == EFI_SUCCESS )
>>>>>> +        efi_arch_console_init(cols, rows);
>>>>>> +
>>>>>> +    gop = efi_get_gop();
>>>>>> +
>>>>>> +    if ( gop )
>>>>>> +        gop_mode = efi_find_gop_mode(gop, 0, 0, 0);
>>>>>> +
>>>>>> +    efi_arch_edd();
>>>>>> +    efi_arch_cpu();
>>>>>> +
>>>>>> +    efi_tables();
>>>>>> +    setup_efi_pci();
>>>>>> +    efi_variables();
>>>>>
>>>>> This is probably where you missed the call to "efi_arch_memory_setup();"
>>>>> that gave me hell above.
>>>>
>>>> Well it turns out that calling "efi_arch_memory_setup()" isn't correct
>>>> because it also messes with the page tables AND also does the trampoline
>>>
>>> Yep.
>>>
>>>> relocation. Which after this call finishes then it does the trampoline
>>>> relocations in assembly. The code currently makes the assumption it can
>>>
>>> I am not sure what do you mean here.
>>>
>>>> use any conventional memory range below 1mb (and unfortunately does the
>>>> math incorrectly and instead uses the region following the conventional
>>>
>>> Which code? Which math?
>>
>> The code where cfg.size = 0 and the extra_mem was missing.
> 
> This should be fixed as I stated above.
> 
>>>> memory region). You need to use AllocatePages() otherwise you are
>>>> trampling memory that might have been allocated by the bootloader or any
>>>
>>> Bootloader code/data should be dead here.
>>
>> Correct. Unfortunately on my Lenovo laptop and my Intel NUCs I can't
>> currently call ExitBootServices and a timer that iPXE has wired up has
> 
> If you disable an important wheel in a machine you should not expect
> that the machine will work. Sorry! No way!

Speak to your co-workers Konrad and Boris. We've had long email threads
about how certain hardware does not work with the way Xen calls
ExitBootServices.

> 
>> some memory reserved down there and it was getting trampled. The real
> 
> I still do not know why remnants of iPXE should run at this Xen boot stage.
> It looks like an iPXE bug and IMO it should be fixed first.

Like I said above, its because on this machine I am unable to call Xen's
EBS.

> 
>> answer is that we need to fix up stock Xen to be able to always call EBS.
> 
> It looks that ExitBootServices() is always called. So, I do not think that
> anything have to be fixed.

It is commented out of this board using the patchset that Konrad
submitted to the ML years ago.

-- 
Doug Goldstein

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.