[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/efi: Fix Grub2 boot on arm64


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Luca Fancellu <luca.fancellu@xxxxxxx>
  • Date: Wed, 3 Nov 2021 16:07:15 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mMzRqRCXP00W+MueiHojl3OajnY7XnqQcfJg8A5FYXQ=; b=aDkCINZuHCv+1rNVnKkYhD0as/1QDnDoTJPCMWiDAXXHgJEUxX1wTjiOvc09oEV+H/tdLVOwM2aBD57EzZa5/mFYHgD2psmWG+xbk0UZBP6yjUPKmtxR1NFuYdWrEleewiCz3fZSbvlZtIGs1yiatueYdKIR8mxk7ayvPLIuIsG6TJjHM2w5OjbKw/7QmDBvLIw/DfH8lL9Lm/qhWGA+O6ef5W0u1AMFfl1dsYHjdhPlriPFEeKbNIhS5exfMPJIyT8hvRs7LgJ9URmsIyh75DxPbmnV8V6hQYQtCOD1dFonYW6pKWY6IGEf0KVJBwUmX9NM1+GP81wZq2mmRCABoA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XzERY7cscE8y5RpwmGOzfIpeRAn0h/8/0ZEfpIH3IUfGQMlAr50V5tB9RWBYrxFTIUMhwA/KbEE5e7emi4F7ckwHCEyhGOu8iwRMDNyyWuIWaSUVc4l2HlzUSAvhM8QVCS4vHl6R1BNs9r586WvdMdsH0MrOl8ZDrxUW9jg/6nlLcn4daIkA+CdIN4nNx0Jrp6O5MuuqcDrgUaOoKg3oBu/SBwsD6eOwTD51SybR8iNLYWQz6mAOFZ6mnKaDyLzP1NDwPPQ8znlUGfNE1RDOUFtXudzOBY5e9P3XncC9QhExrncQUadH9Z7gWJJZAHD4/neelsFELuey1jiw0VdMbQ==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Wed, 03 Nov 2021 16:07:56 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;


> On 3 Nov 2021, at 15:51, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> 
> On 03.11.2021 16:41, Bertrand Marquis wrote:
>> Hi Jan,
>> 
>>> On 3 Nov 2021, at 15:30, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>> 
>>> On 03.11.2021 16:16, Luca Fancellu wrote:
>>>> 
>>>> 
>>>>> On 3 Nov 2021, at 14:30, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>> 
>>>>> On 03.11.2021 15:09, Luca Fancellu wrote:
>>>>>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>> On 03.11.2021 11:20, Luca Fancellu wrote:
>>>>>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>>>> On 02.11.2021 18:12, Luca Fancellu wrote:
>>>>>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote:
>>>>>>>>>>>> The code introduced by commit 
>>>>>>>>>>>> a1743fc3a9fe9b68c265c45264dddf214fd9b882
>>>>>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is
>>>>>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using 
>>>>>>>>>>>> EDK2.
>>>>>>>>>>>> 
>>>>>>>>>>>> The problem comes from the function get_parent_handle(...) that 
>>>>>>>>>>>> inside
>>>>>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>>>>>>>>>> is NULL, making Xen stop the UEFI boot.
>>>>>>>>>>> 
>>>>>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for
>>>>>>>>>>> this to be NULL. Could you clarify why this is the case? What other
>>>>>>>>>>> information may end up being invalid / absent? Is e.g. 
>>>>>>>>>>> read_section()
>>>>>>>>>>> safe to use?
>>>>>>>>>> 
>>>>>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that
>>>>>>>>>> when Xen is started, the get_parent_handle(…) call was failing and 
>>>>>>>>>> stopping
>>>>>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the
>>>>>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning
>>>>>>>>>> a EFI_INVALID_PARAMETER.
>>>>>>>>>> So the parent handle can’t be requested and the filesystem can’t be 
>>>>>>>>>> used,
>>>>>>>>>> but any other code that doesn’t use the handle provided by 
>>>>>>>>>> get_parent_handle(…)
>>>>>>>>>> can be used without problem like read_section(...).
>>>>>>>>> 
>>>>>>>>> I understand this. My question was for the reason of ->DeviceHandle
>>>>>>>>> being NULL. IOW I'm wondering whether we're actually talking about a
>>>>>>>>> firmware or GrUB bug, in which case your change is a workaround for
>>>>>>>>> that rather than (primarily) a fix for the earlier Xen change.
>>>>>>>> 
>>>>>>>> The issue was found only when using EDK2+Grub2, no issue when booting
>>>>>>>> directly from EDK2.
>>>>>>>> This is a fix for the regression, because without the EFI changes, 
>>>>>>>> Grub2 was
>>>>>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very 
>>>>>>>> common
>>>>>>>> solution so not supporting this anymore could lead to lots of people 
>>>>>>>> having
>>>>>>>> issues if they update to Xen 4.16.
>>>>>>> 
>>>>>>> I'm not objecting to addressing the issue. But the description needs
>>>>>>> to make clear where the origin of the problem lies, and afaict that's
>>>>>>> not the earlier Xen change. That one merely uncovered what, according
>>>>>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I
>>>>>>> merely haven't been able to spot provisions in the spec for the field
>>>>>>> in question to be NULL.
>>>>>> 
>>>>>> Maybe I can rephrase to be more specific from:
>>>>>> 
>>>>>> The problem comes from the function get_parent_handle(...) that inside
>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>>>> is NULL, making Xen stop the UEFI boot.
>>>>>> 
>>>>>> To:
>>>>>> 
>>>>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle,
>>>>>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…),
>>>>>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error.
>>>>>> 
>>>>>> Do you think it can be ok like this?
>>>>> 
>>>>> Much better, yes, but I wonder what "returning" refers to. You want to
>>>>> describe the origin of the NULL handle as precisely as possible. And
>>>>> considering this turns out as a workaround, in a suitable place you
>>>>> will also want to add a code comment, such that a later reader won't
>>>>> decide this is all dead code and can be done in a simpler way.
>>>> 
>>>> Ok I can write the issue from the beginning to be sure:
>>>> 
>>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle
>>>> inside the interface given by the LOADED_IMAGE_PROTOCOL service, this
>>>> handle is used later by efi_bs->HandleProtocol(…) inside 
>>>> get_parent_handle(…)
>>>> when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface,
>>>> causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error.
>>>> 
>>>> Regarding the comment, I can rephrase this comment:
>>>> /*
>>>> * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
>>>> * to have access to the filesystem.
>>>> */
>>>> 
>>>> To be:
>>>> 
>>>> /*
>>>> * If DeviceHandle is NULL, the firmware offering the UEFI services might 
>>>> not be
>>>> * compliant to the standard and we can't use the 
>>>> SIMPLE_FILE_SYSTEM_PROTOCOL
>>>> * to have access to the filesystem. However the system can boot if and 
>>>> only if it doesn’t
>>>> * require access to the filesystem. (e.g. Xen image has everything built 
>>>> in or the
>>>> * bootloader did previously load every needed binary in memory)
>>>> */
>>>> 
>>>> What do you think?
>>> 
>>> Largely okay, albeit you don't mention GrUB at all (which isn't really part
>>> of the firmware, but which looks to be the culprit) and it gets a little
>>> too verbose. Provided the facts have been verified, how about
>>> 
>>>   /*
>>>    * GrUB has been observed to supply a NULL DeviceHandle. We can't use
>>>    * that to gain access to the filesystem. However the system can still
>>>    * boot if it doesn’t require access to the filesystem.
>>>    */
>>> 
>>> (and it's up to you whether you include your further "e.g. ..." then, but
>>> if you do I think the parenthesized part belong before the final full
>>> stop, and the last "in" would want to be "into")? It's still dubious to me
>>> how they can get away with such a NULL handle (and why that happens only
>>> on Arm), but I guess it would go too far to try to understand the
>>> background.
>> 
>> This might not be a problem in Grub but actually in EDK2 or due to the fact 
>> that
>> EDK2 is starting Grub which is starting Xen. Grub is not modifying this 
>> explicitly
>> from what we found during our investigations.
> 
> Otoh Luca said that there's no problem without GrUB. So maybe "GrUB
> in combination with EDK2 ..."?

Yes as Bertrand suggested and following your wording is more complete to say:

    /*
     * Grub2 running on top of EDK2 has been observed to supply a NULL
     * DeviceHandle. We can't use that to gain access to the filesystem.
     * However the system can still boot if it doesn’t require access to the
     * filesystem.
     */

> 
> Thinking more about it, this may also be partly related to our
> limitation to loading files from file systems (and not e.g. networks
> or RAM). Yet even then I couldn't see how a NULL device handle could
> be used for anything.
> 
> Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.