[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/efi: Fix Grub2 boot on arm64
> On 3 Nov 2021, at 15:30, Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 03.11.2021 16:16, Luca Fancellu wrote: >> >> >>> On 3 Nov 2021, at 14:30, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>> >>> On 03.11.2021 15:09, Luca Fancellu wrote: >>>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>> On 03.11.2021 11:20, Luca Fancellu wrote: >>>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>>>> On 02.11.2021 18:12, Luca Fancellu wrote: >>>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote: >>>>>>>>>> The code introduced by commit >>>>>>>>>> a1743fc3a9fe9b68c265c45264dddf214fd9b882 >>>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is >>>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using >>>>>>>>>> EDK2. >>>>>>>>>> >>>>>>>>>> The problem comes from the function get_parent_handle(...) that >>>>>>>>>> inside >>>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last >>>>>>>>>> is NULL, making Xen stop the UEFI boot. >>>>>>>>> >>>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for >>>>>>>>> this to be NULL. Could you clarify why this is the case? What other >>>>>>>>> information may end up being invalid / absent? Is e.g. read_section() >>>>>>>>> safe to use? >>>>>>>> >>>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that >>>>>>>> when Xen is started, the get_parent_handle(…) call was failing and >>>>>>>> stopping >>>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the >>>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning >>>>>>>> a EFI_INVALID_PARAMETER. >>>>>>>> So the parent handle can’t be requested and the filesystem can’t be >>>>>>>> used, >>>>>>>> but any other code that doesn’t use the handle provided by >>>>>>>> get_parent_handle(…) >>>>>>>> can be used without problem like read_section(...). >>>>>>> >>>>>>> I understand this. My question was for the reason of ->DeviceHandle >>>>>>> being NULL. IOW I'm wondering whether we're actually talking about a >>>>>>> firmware or GrUB bug, in which case your change is a workaround for >>>>>>> that rather than (primarily) a fix for the earlier Xen change. >>>>>> >>>>>> The issue was found only when using EDK2+Grub2, no issue when booting >>>>>> directly from EDK2. >>>>>> This is a fix for the regression, because without the EFI changes, Grub2 >>>>>> was >>>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very common >>>>>> solution so not supporting this anymore could lead to lots of people >>>>>> having >>>>>> issues if they update to Xen 4.16. >>>>> >>>>> I'm not objecting to addressing the issue. But the description needs >>>>> to make clear where the origin of the problem lies, and afaict that's >>>>> not the earlier Xen change. That one merely uncovered what, according >>>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I >>>>> merely haven't been able to spot provisions in the spec for the field >>>>> in question to be NULL. >>>> >>>> Maybe I can rephrase to be more specific from: >>>> >>>> The problem comes from the function get_parent_handle(...) that inside >>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last >>>> is NULL, making Xen stop the UEFI boot. >>>> >>>> To: >>>> >>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle, >>>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…), >>>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error. >>>> >>>> Do you think it can be ok like this? >>> >>> Much better, yes, but I wonder what "returning" refers to. You want to >>> describe the origin of the NULL handle as precisely as possible. And >>> considering this turns out as a workaround, in a suitable place you >>> will also want to add a code comment, such that a later reader won't >>> decide this is all dead code and can be done in a simpler way. >> >> Ok I can write the issue from the beginning to be sure: >> >> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle >> inside the interface given by the LOADED_IMAGE_PROTOCOL service, this >> handle is used later by efi_bs->HandleProtocol(…) inside get_parent_handle(…) >> when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface, >> causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error. >> >> Regarding the comment, I can rephrase this comment: >> /* >> * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL >> * to have access to the filesystem. >> */ >> >> To be: >> >> /* >> * If DeviceHandle is NULL, the firmware offering the UEFI services might not >> be >> * compliant to the standard and we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL >> * to have access to the filesystem. However the system can boot if and only >> if it doesn’t >> * require access to the filesystem. (e.g. Xen image has everything built in >> or the >> * bootloader did previously load every needed binary in memory) >> */ >> >> What do you think? > > Largely okay, albeit you don't mention GrUB at all (which isn't really part > of the firmware, but which looks to be the culprit) and it gets a little > too verbose. Provided the facts have been verified, how about > > /* > * GrUB has been observed to supply a NULL DeviceHandle. We can't use > * that to gain access to the filesystem. However the system can still > * boot if it doesn’t require access to the filesystem. > */ Ok yes this is better, I will do the changes to the commit message and the comment for the v2, I’ll wait Stefano reply to see if I have to include also his suggestion. Thank you. > > (and it's up to you whether you include your further "e.g. ..." then, but > if you do I think the parenthesized part belong before the final full > stop, and the last "in" would want to be "into")? It's still dubious to me > how they can get away with such a NULL handle (and why that happens only > on Arm), but I guess it would go too far to try to understand the > background. > > Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |