[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/efi: Fix Grub2 boot on arm64
> On 3 Nov 2021, at 15:51, Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 03.11.2021 16:41, Bertrand Marquis wrote: >> Hi Jan, >> >>> On 3 Nov 2021, at 15:30, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>> >>> On 03.11.2021 16:16, Luca Fancellu wrote: >>>> >>>> >>>>> On 3 Nov 2021, at 14:30, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>> >>>>> On 03.11.2021 15:09, Luca Fancellu wrote: >>>>>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>>>> On 03.11.2021 11:20, Luca Fancellu wrote: >>>>>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>>>>>> On 02.11.2021 18:12, Luca Fancellu wrote: >>>>>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>>>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote: >>>>>>>>>>>> The code introduced by commit >>>>>>>>>>>> a1743fc3a9fe9b68c265c45264dddf214fd9b882 >>>>>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is >>>>>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using >>>>>>>>>>>> EDK2. >>>>>>>>>>>> >>>>>>>>>>>> The problem comes from the function get_parent_handle(...) that >>>>>>>>>>>> inside >>>>>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last >>>>>>>>>>>> is NULL, making Xen stop the UEFI boot. >>>>>>>>>>> >>>>>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for >>>>>>>>>>> this to be NULL. Could you clarify why this is the case? What other >>>>>>>>>>> information may end up being invalid / absent? Is e.g. >>>>>>>>>>> read_section() >>>>>>>>>>> safe to use? >>>>>>>>>> >>>>>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that >>>>>>>>>> when Xen is started, the get_parent_handle(…) call was failing and >>>>>>>>>> stopping >>>>>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the >>>>>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning >>>>>>>>>> a EFI_INVALID_PARAMETER. >>>>>>>>>> So the parent handle can’t be requested and the filesystem can’t be >>>>>>>>>> used, >>>>>>>>>> but any other code that doesn’t use the handle provided by >>>>>>>>>> get_parent_handle(…) >>>>>>>>>> can be used without problem like read_section(...). >>>>>>>>> >>>>>>>>> I understand this. My question was for the reason of ->DeviceHandle >>>>>>>>> being NULL. IOW I'm wondering whether we're actually talking about a >>>>>>>>> firmware or GrUB bug, in which case your change is a workaround for >>>>>>>>> that rather than (primarily) a fix for the earlier Xen change. >>>>>>>> >>>>>>>> The issue was found only when using EDK2+Grub2, no issue when booting >>>>>>>> directly from EDK2. >>>>>>>> This is a fix for the regression, because without the EFI changes, >>>>>>>> Grub2 was >>>>>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very >>>>>>>> common >>>>>>>> solution so not supporting this anymore could lead to lots of people >>>>>>>> having >>>>>>>> issues if they update to Xen 4.16. >>>>>>> >>>>>>> I'm not objecting to addressing the issue. But the description needs >>>>>>> to make clear where the origin of the problem lies, and afaict that's >>>>>>> not the earlier Xen change. That one merely uncovered what, according >>>>>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I >>>>>>> merely haven't been able to spot provisions in the spec for the field >>>>>>> in question to be NULL. >>>>>> >>>>>> Maybe I can rephrase to be more specific from: >>>>>> >>>>>> The problem comes from the function get_parent_handle(...) that inside >>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last >>>>>> is NULL, making Xen stop the UEFI boot. >>>>>> >>>>>> To: >>>>>> >>>>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle, >>>>>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…), >>>>>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error. >>>>>> >>>>>> Do you think it can be ok like this? >>>>> >>>>> Much better, yes, but I wonder what "returning" refers to. You want to >>>>> describe the origin of the NULL handle as precisely as possible. And >>>>> considering this turns out as a workaround, in a suitable place you >>>>> will also want to add a code comment, such that a later reader won't >>>>> decide this is all dead code and can be done in a simpler way. >>>> >>>> Ok I can write the issue from the beginning to be sure: >>>> >>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle >>>> inside the interface given by the LOADED_IMAGE_PROTOCOL service, this >>>> handle is used later by efi_bs->HandleProtocol(…) inside >>>> get_parent_handle(…) >>>> when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface, >>>> causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error. >>>> >>>> Regarding the comment, I can rephrase this comment: >>>> /* >>>> * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL >>>> * to have access to the filesystem. >>>> */ >>>> >>>> To be: >>>> >>>> /* >>>> * If DeviceHandle is NULL, the firmware offering the UEFI services might >>>> not be >>>> * compliant to the standard and we can't use the >>>> SIMPLE_FILE_SYSTEM_PROTOCOL >>>> * to have access to the filesystem. However the system can boot if and >>>> only if it doesn’t >>>> * require access to the filesystem. (e.g. Xen image has everything built >>>> in or the >>>> * bootloader did previously load every needed binary in memory) >>>> */ >>>> >>>> What do you think? >>> >>> Largely okay, albeit you don't mention GrUB at all (which isn't really part >>> of the firmware, but which looks to be the culprit) and it gets a little >>> too verbose. Provided the facts have been verified, how about >>> >>> /* >>> * GrUB has been observed to supply a NULL DeviceHandle. We can't use >>> * that to gain access to the filesystem. However the system can still >>> * boot if it doesn’t require access to the filesystem. >>> */ >>> >>> (and it's up to you whether you include your further "e.g. ..." then, but >>> if you do I think the parenthesized part belong before the final full >>> stop, and the last "in" would want to be "into")? It's still dubious to me >>> how they can get away with such a NULL handle (and why that happens only >>> on Arm), but I guess it would go too far to try to understand the >>> background. >> >> This might not be a problem in Grub but actually in EDK2 or due to the fact >> that >> EDK2 is starting Grub which is starting Xen. Grub is not modifying this >> explicitly >> from what we found during our investigations. > > Otoh Luca said that there's no problem without GrUB. So maybe "GrUB > in combination with EDK2 ..."? Yes as Bertrand suggested and following your wording is more complete to say: /* * Grub2 running on top of EDK2 has been observed to supply a NULL * DeviceHandle. We can't use that to gain access to the filesystem. * However the system can still boot if it doesn’t require access to the * filesystem. */ > > Thinking more about it, this may also be partly related to our > limitation to loading files from file systems (and not e.g. networks > or RAM). Yet even then I couldn't see how a NULL device handle could > be used for anything. > > Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |