[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/efi: Fix Grub2 boot on arm64


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Luca Fancellu <luca.fancellu@xxxxxxx>
  • Date: Wed, 3 Nov 2021 15:16:25 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hkaGPG38pcNYLgdAHpkY2CIwtUuOV0u+KZq/w0kKhI0=; b=c9qE9s7YpeqxbJFxl7RbJGK7QLG77ImfFRBsV4DRicYcVxaWbJ38FX7QZo/au8vl9WEDObzSyqTprdvJNO6RUYSG2xGPav7hKlLrjXPmYHC0rLP9SQ4+cFhsZ/QMKi7ZdxTt/MQSucT3730eo5dGeP45b3Uj2c3C0qvURvFJaoHcXxrvIdqYSKy2n0hqWdwOn2N4j96KX1FUv7EE75pm5Aeur4ZyxynByZ9Uskb7l1sNAEm/45cHyZTWkW3L48lWmn3TQp5i4Mm+QQ0xlpFngXqFJArotDo3s5aV6FIfXqBgCKj+UAMeR6aAWjszXOgGSZDqVspCPkngFpQgh76F/A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K7CR2z7lW6tpqvlxYI4SdLRVQE2EQXeLb2HJfuGnJPxYp6d6qSvsMjPZ/2lbjGh0AvcTSdpZXMqrzQV/prWK2Z3mR0POLM4byS3IyLMO/ieHA6wapPT8ShjwLmu6LwG8KIpgWDh3t1hrraZI4hVo3swaNpF68ZCDh68ncpmatzmcMAVQThvNiR7tnOlS1oLi0VoxIuGJqL2T/Pi8JDHNixrmooXhXWZWIGlo+KEhj+TcXQDH+NnilDQ0BM+f/auSkm0Lka7QMCA0G+tBPhM8j4wNCCwUvfIc3bME7QZf8S13tzMebOhu7RcIImKjkQV33BjxQjWGoidcAJD1LUqWKA==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Bertrand Marquis <bertrand.marquis@xxxxxxx>, wei.chen@xxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Wed, 03 Nov 2021 15:16:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;


> On 3 Nov 2021, at 14:30, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> 
> On 03.11.2021 15:09, Luca Fancellu wrote:
>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>> On 03.11.2021 11:20, Luca Fancellu wrote:
>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>> On 02.11.2021 18:12, Luca Fancellu wrote:
>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote:
>>>>>>>> The code introduced by commit a1743fc3a9fe9b68c265c45264dddf214fd9b882
>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is
>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using 
>>>>>>>> EDK2.
>>>>>>>> 
>>>>>>>> The problem comes from the function get_parent_handle(...) that inside
>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>>>>>> is NULL, making Xen stop the UEFI boot.
>>>>>>> 
>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for
>>>>>>> this to be NULL. Could you clarify why this is the case? What other
>>>>>>> information may end up being invalid / absent? Is e.g. read_section()
>>>>>>> safe to use?
>>>>>> 
>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that
>>>>>> when Xen is started, the get_parent_handle(…) call was failing and 
>>>>>> stopping
>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the
>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning
>>>>>> a EFI_INVALID_PARAMETER.
>>>>>> So the parent handle can’t be requested and the filesystem can’t be used,
>>>>>> but any other code that doesn’t use the handle provided by 
>>>>>> get_parent_handle(…)
>>>>>> can be used without problem like read_section(...).
>>>>> 
>>>>> I understand this. My question was for the reason of ->DeviceHandle
>>>>> being NULL. IOW I'm wondering whether we're actually talking about a
>>>>> firmware or GrUB bug, in which case your change is a workaround for
>>>>> that rather than (primarily) a fix for the earlier Xen change.
>>>> 
>>>> The issue was found only when using EDK2+Grub2, no issue when booting
>>>> directly from EDK2.
>>>> This is a fix for the regression, because without the EFI changes, Grub2 
>>>> was
>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very common
>>>> solution so not supporting this anymore could lead to lots of people having
>>>> issues if they update to Xen 4.16.
>>> 
>>> I'm not objecting to addressing the issue. But the description needs
>>> to make clear where the origin of the problem lies, and afaict that's
>>> not the earlier Xen change. That one merely uncovered what, according
>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I
>>> merely haven't been able to spot provisions in the spec for the field
>>> in question to be NULL.
>> 
>> Maybe I can rephrase to be more specific from:
>> 
>> The problem comes from the function get_parent_handle(...) that inside
>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>> is NULL, making Xen stop the UEFI boot.
>> 
>> To:
>> 
>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle,
>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…),
>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error.
>> 
>> Do you think it can be ok like this?
> 
> Much better, yes, but I wonder what "returning" refers to. You want to
> describe the origin of the NULL handle as precisely as possible. And
> considering this turns out as a workaround, in a suitable place you
> will also want to add a code comment, such that a later reader won't
> decide this is all dead code and can be done in a simpler way.

Ok I can write the issue from the beginning to be sure:

Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle
inside the interface given by the LOADED_IMAGE_PROTOCOL service, this
handle is used later by efi_bs->HandleProtocol(…) inside get_parent_handle(…)
when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface,
causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error.

Regarding the comment, I can rephrase this comment:
/*
 * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
 * to have access to the filesystem.
 */

To be:

/*
 * If DeviceHandle is NULL, the firmware offering the UEFI services might not be
 * compliant to the standard and we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
 * to have access to the filesystem. However the system can boot if and only if 
it doesn’t
 * require access to the filesystem. (e.g. Xen image has everything built in or 
the
 * bootloader did previously load every needed binary in memory)
 */

What do you think?

Cheers,
Luca

> 
> Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.