[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/efi: Fix Grub2 boot on arm64


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Wed, 3 Nov 2021 15:41:14 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0NhXzTXQIkd8ALmhQUukWAfxre7vJ6gUJxyOU463NjY=; b=ZXzfLDdXz2ZRCdyiMMW2oLqGmbWvPSufc7vmL3wBB8ntG/GKmK3iVONKDceOUC5Wdm0M0Kk9VXpfOJMF9IcOl3CqMS+jstbeXmsZl7TWieK8WoXqYlTIw2Epi3L8cwvOnbjArYsuHOhgip/SoS2gGIzVFHBwfLmHOIHKW8dOdyW2xUIe3ZX5twnov3TfSvENUF33aLyvDufGvNHIivICAx7yCJWfsYd1/+SqD2odeXxenqIbCXH3nTpuXgLBNx66HaclmyDxNYwuJNMXS25+eXHY3k6n6Yo4QNmmw0lNWNJh/NoWiOElMnFewkPYW9DG93mj8NxB8G+IFVjgdSY1rg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U8lhQJSkOi0ni5z9COlsHsiStD6XTEkUGNwc6fQR+hOtdnhyUMJooQYAQH7mdWxheIcIfWmysGUFaliZroDsSiqnhCA4GoRsTI39sR11awORzT8JGPbLRrlGkdNxjKwUN4ja9sIxU0EqE8F1EyWs8DHWVjA2qOyJ6R7/GzFAeUEAAgGr2OtgPcG4XR84CEg25zB2y2Zh7seOuWk5TF4GV5h07o9fnBSPsZKBoMCNBNvQUzYsI79IAv6y1fCG+LrkL/XXlRjDth2fheJh2kNsJ52MDBBEvgUmBCSrvyuKRIUO/nP8s/vOhqPfzviEskDMY/f8kzbWGxE5rW3PuKmiGw==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Luca Fancellu <Luca.Fancellu@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Wed, 03 Nov 2021 15:41:38 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHXz/K6mNGHgi2TXkyW35dcCbo6tKvwULEAgAApEoCAAP3KAIAAIV4AgAAS9ACAAC0JgIAABgMAgAAMuICAAAPMgIAAAyMA
  • Thread-topic: [PATCH] xen/efi: Fix Grub2 boot on arm64

Hi Jan,

> On 3 Nov 2021, at 15:30, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> 
> On 03.11.2021 16:16, Luca Fancellu wrote:
>> 
>> 
>>> On 3 Nov 2021, at 14:30, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>> 
>>> On 03.11.2021 15:09, Luca Fancellu wrote:
>>>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>> On 03.11.2021 11:20, Luca Fancellu wrote:
>>>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>> On 02.11.2021 18:12, Luca Fancellu wrote:
>>>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote:
>>>>>>>>>> The code introduced by commit 
>>>>>>>>>> a1743fc3a9fe9b68c265c45264dddf214fd9b882
>>>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is
>>>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using 
>>>>>>>>>> EDK2.
>>>>>>>>>> 
>>>>>>>>>> The problem comes from the function get_parent_handle(...) that 
>>>>>>>>>> inside
>>>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>>>>>>>> is NULL, making Xen stop the UEFI boot.
>>>>>>>>> 
>>>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for
>>>>>>>>> this to be NULL. Could you clarify why this is the case? What other
>>>>>>>>> information may end up being invalid / absent? Is e.g. read_section()
>>>>>>>>> safe to use?
>>>>>>>> 
>>>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that
>>>>>>>> when Xen is started, the get_parent_handle(…) call was failing and 
>>>>>>>> stopping
>>>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the
>>>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning
>>>>>>>> a EFI_INVALID_PARAMETER.
>>>>>>>> So the parent handle can’t be requested and the filesystem can’t be 
>>>>>>>> used,
>>>>>>>> but any other code that doesn’t use the handle provided by 
>>>>>>>> get_parent_handle(…)
>>>>>>>> can be used without problem like read_section(...).
>>>>>>> 
>>>>>>> I understand this. My question was for the reason of ->DeviceHandle
>>>>>>> being NULL. IOW I'm wondering whether we're actually talking about a
>>>>>>> firmware or GrUB bug, in which case your change is a workaround for
>>>>>>> that rather than (primarily) a fix for the earlier Xen change.
>>>>>> 
>>>>>> The issue was found only when using EDK2+Grub2, no issue when booting
>>>>>> directly from EDK2.
>>>>>> This is a fix for the regression, because without the EFI changes, Grub2 
>>>>>> was
>>>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very common
>>>>>> solution so not supporting this anymore could lead to lots of people 
>>>>>> having
>>>>>> issues if they update to Xen 4.16.
>>>>> 
>>>>> I'm not objecting to addressing the issue. But the description needs
>>>>> to make clear where the origin of the problem lies, and afaict that's
>>>>> not the earlier Xen change. That one merely uncovered what, according
>>>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I
>>>>> merely haven't been able to spot provisions in the spec for the field
>>>>> in question to be NULL.
>>>> 
>>>> Maybe I can rephrase to be more specific from:
>>>> 
>>>> The problem comes from the function get_parent_handle(...) that inside
>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>> is NULL, making Xen stop the UEFI boot.
>>>> 
>>>> To:
>>>> 
>>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle,
>>>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…),
>>>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error.
>>>> 
>>>> Do you think it can be ok like this?
>>> 
>>> Much better, yes, but I wonder what "returning" refers to. You want to
>>> describe the origin of the NULL handle as precisely as possible. And
>>> considering this turns out as a workaround, in a suitable place you
>>> will also want to add a code comment, such that a later reader won't
>>> decide this is all dead code and can be done in a simpler way.
>> 
>> Ok I can write the issue from the beginning to be sure:
>> 
>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle
>> inside the interface given by the LOADED_IMAGE_PROTOCOL service, this
>> handle is used later by efi_bs->HandleProtocol(…) inside get_parent_handle(…)
>> when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface,
>> causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error.
>> 
>> Regarding the comment, I can rephrase this comment:
>> /*
>> * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
>> * to have access to the filesystem.
>> */
>> 
>> To be:
>> 
>> /*
>> * If DeviceHandle is NULL, the firmware offering the UEFI services might not 
>> be
>> * compliant to the standard and we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
>> * to have access to the filesystem. However the system can boot if and only 
>> if it doesn’t
>> * require access to the filesystem. (e.g. Xen image has everything built in 
>> or the
>> * bootloader did previously load every needed binary in memory)
>> */
>> 
>> What do you think?
> 
> Largely okay, albeit you don't mention GrUB at all (which isn't really part
> of the firmware, but which looks to be the culprit) and it gets a little
> too verbose. Provided the facts have been verified, how about
> 
>    /*
>     * GrUB has been observed to supply a NULL DeviceHandle. We can't use
>     * that to gain access to the filesystem. However the system can still
>     * boot if it doesn’t require access to the filesystem.
>     */
> 
> (and it's up to you whether you include your further "e.g. ..." then, but
> if you do I think the parenthesized part belong before the final full
> stop, and the last "in" would want to be "into")? It's still dubious to me
> how they can get away with such a NULL handle (and why that happens only
> on Arm), but I guess it would go too far to try to understand the
> background.

This might not be a problem in Grub but actually in EDK2 or due to the fact that
EDK2 is starting Grub which is starting Xen. Grub is not modifying this 
explicitly
from what we found during our investigations.

Cheers
Bertrand

> 
> Jan


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.