[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.12 panic on Thinkpad W540 with UEFI mutiboot2, efi=no-rs workarounds it



On Thu, Aug 08, 2019 at 08:03:49AM +0200, Jan Beulich wrote:
> On 08.08.2019 04:53, Marek Marczykowski-Górecki  wrote:
> > On Wed, Aug 07, 2019 at 09:26:00PM +0200, Marek Marczykowski-Górecki wrote:
> > > Ok, regardless of adding proper option for that, I've hardcoded map_bs=1
> > > and it still crashes, just slightly differently:
> > > 
> > >      Xen call trace:
> > >         [<0000000000000080>] 0000000000000080
> > >         [<8c2b0398e0000daa>] 8c2b0398e0000daa
> > > 
> > >      Pagetable walk from ffffffff858483a1:
> > >         L4[0x1ff] = 0000000000000000 ffffffffffffffff
> > > 
> > >      ****************************************
> > >      Panic on CPU 0:
> > >      FATAL PAGE FAULT
> > >      [error_code=0002]
> > >      Faulting linear address: ffffffff858483a1
> > >      ****************************************
> > > 
> > > Full message attached.
> > 
> > After playing more with it and also know workarounds for various EFI
> > issues, I've found a way to boot it: avoid calling Exit BootServices.
> > There was a patch from Konrad adding /noexit option, that never get
> > committed. Similar to efi=mapbs option, I'd add efi=no-exitboot too
> > (once efi=mapbs patch is accepted).
> > 
> > Anyway, I'm curious what exactly is wrong here. Is it that the firmware
> > is not happy about lack of SetVirtualAddressMap call? FWIW, the crash is
> > during GetVariable RS call. I've verified that the function itself is
> > within EfiRuntimeServicesCode, but I don't feel like tracing Lenovo
> > UEFI...
> 
> This suggests that the firmware zaps a few too many pointers
> during ExitBootServices(). Perhaps internally they check
> whether pointers point into BootServices* memory, and hence the
> wrong marking in the memory map has consequences beyond the OS
> re-using such memory?
> 
> A proper answer to your question can of course only be given
> by someone knowing this specific firmware version.

I explored it a bit more and talked with a few people doing firmware
development and few conclusions:
1. Not calling SetVirtualAddressMap(), while technically legal, is
pretty uncommon and not recommended if you want to avoid less tested
(aka buggy) UEFI code paths.
2. Every UEFI call before SetVirtualAddressMap() call should be done
with flat physical memory. This include SetVirtualAddressMap() call
itself. Implicitly this means such calls can legally access memory areas
not marked with EFI_MEMORY_RUNTIME.

For the second point, relevant part of UEFI 2.7 spec (Runtime Services
- Virtual Memory Services chapter):

    This section contains function definitions for the virtual memory support 
that may be
    optionally used by an operating system at runtime. If an operating system 
chooses to
    make EFI runtime service calls in a virtual addressing mode instead of the 
flat physical
    mode, then the operating system must use the services in this section to 
switch the EFI
    runtime services from flat physical addressing to virtual addressing.
(...)
    The call to SetVirtualAddressMap() must be done with the physical mappings. 
On
    successful return from this function, the system must then make any future 
calls with the
    newly assigned virtual mappings. All address space mappings must be done in
    accordance to the cacheability flags as specified in the original address 
map.

I've tried to poke around this part of Xen code, including resurrecting
SetVirtualAddressMap() (#define USE_SET_VIRTUAL_ADDRESS_MAP in
common/efi/boot.c) and (unsurprisingly) hit multiple issues:
 - at this point of time, Xen is already relocated and paging is enabled
 - SetVirtualAddressMap() is indeed not happy about being called with
   new address map in place already
 - directmap - at which that code points - is mapped with NX, which breaks
   EfiRuntimeServicesCode area

Then I've tried a different approach: call SetVirtualAddressMap(), but
with an address map that tries to pretend physical addressing (the code
under #ifndef USE_SET_VIRTUAL_ADDRESS_MAP). This mostly worked, I needed
only few changes:
 - set VirtualStart back to PhysicalStart in that memory map (it was set
   to directmap)
 - map boot services (at least for the SetVirtualAddressMap() call time,
   but haven't tried unmapping it later)
 - call SetVirtualAddressMap() with that "1:1" map in place, using
   efi_rs_enter/efi_rs_leave.

This fixed the issue for me, now runtime services do work even without
disabling ExitBootServices() call. And without any extra
platform-specific command line arguments. And I think it also shouldn't break
kexec, since it uses 1:1-like map, but I haven't tried. One should
simply ignore EFI_UNSUPPORTED return code (I don't know how to avoid the
call at all after kexec).

Any thoughts? If the above sounds good, I'll cleanup the patch and
submit it.
BTW Does it qualify for 4.13? On one hand it may be seen as a bugfix
(fix booting on some UEFI firmwares), but on the other hand, I can't
think of all the side effects.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.