[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs



On 25.08.2020 04:30, Roman Shaposhnik wrote:
> On Fri, Aug 21, 2020 at 1:23 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 21.08.2020 09:38, Roman Shaposhnik wrote:
>>> I think we're talking slightly past each other here -- you seem to be
>>> more after trying to figure out how to make this box look like a dozen
>>> killobucks worth a server, I'm after trying to figure out what callsites
>>> in Xen tickle that region.
>>
>> What I'm trying is to understand what exactly is wrong in the firmware,
>> as that'll likely allow determining a minimal workaround.
> 
> Fair enough. So let me start with a major update. After a bit of trial and
> error it became apparent that a combination of efi=attr=uc AND
> removing the call to efi_get_time as per:
>     https://lists.archive.carbon60.com/xen/devel/408709
> allows Xen to boot just fine and function properly on that device.

Interesting. I'd be curious of what results if just one of the
two is used (for both of them).

>>> I appreciate and respect your position, but please hear mine as well:
>>> yes we're clearly into the "workaround" territory here, but clearly
>>> Linux is fully capable of these workaround and I would like to understand
>>> how expensive it will be to teach Xen those tricks as well.
>>
>> My prime example here is their blanket avoiding of the time related
>> runtime services, despite the EFI spec saying the exact opposite.
> 
> Well, to be fair, it seems that the practical experience with various
> bits of hardware suggests that in this particular case avoidance
> may be the lesser of all the evils.
> 
> Or to ask a complimentary question: what's the danger of making that
> patch (in a cleaned up form) the default behaviour? Will there be any
> instances of hardware where it may actually hurt?

ACPI tables have a flag indicating that there's no CMOS clock in
the system. Without that, and without use of GetTime() there's no
wall clock source. Andrew keeps suggesting that we shouldn't
have a need to use the wall clock, but I'm afraid I haven't been
able to understand what the alternative (and still backwards
compatible) behavior would be, and hence I've been hoping for him
to put together patches carrying out the plan.

Of course in turn there are systems having the flag set despite
there being a CMOS clock, telling people that there are reasons
to ignore the flag (and still avoid GetTime()).

Apart from this case (which could be taken care of) there's the
collision with my underlying position here that I've described
before: Xen should be spec conformant on spec conformant
systems. This implies using GetTime() when running on EFI.

>> "efi=no-rs" is just a wider scope workaround of this same kind.
> 
> The problem with "efi=no-rs" is that it is actually unbounded.
> 
> IOW, compare two cases:
>    1. disable a single call to GetTime()
>    2. disable all calls to EFI RS?
> Case #1 I can reason about -- case #2 -- not so much (unless somebody
> explains to me the full scope of what gets disabled when efi=no-rs).
> 
> Now, you may say (and seems like you do ;-)) that if a small part of
> the implementation can't be trusted -- the entire thing shouldn't be
> trusted -- I don't think I will buy into that policy -- but it is a policy.

The common case is that parts of memory accessed by runtime services
aren't marked for runtime use in the memory map. Once there is _one_
such problem that the developers of the firmware allowed to slip in,
how can you trust there being exactly one such problem, or how can
you be certain of which of the runtime service functions are affected.

As to your question regarding the full scope - I'm pretty unclear
what you mean by this: No use of runtime services means exactly that.
Just go look at the EFI_RUNTIME_SERVICES struct (plus of course
whatever is wired up in the first place in Xen). If you're after
end user visible effects, I'm afraid this is the wrong forum to ask,
as those will depend on what is wired up (and hence expected to be
usable) in higher software layers.

>> The reasoning I see behind this is that if the time related runtime
>> services are problematic, how likely is it that others are fine to
>> use? And how would an admin know without first having run into some
>> crash? If there are fair reasons to have finer grained disabling of
>> runtime services - why not? But it'll still take a command line
>> option to do so, unless (as was proposed) a build-time option of
>> enabling all (common?) workarounds by default gets made use of.
> 
> Well, policy (and trust issues) aside -- I think the real question
> is -- it seems that there's quite a bit of downstream that agrees
> that avoiding GetTime() is a good idea. What options do we have
> to make that possible without each downstream carrying a custom
> patch (which I'm adding to EVE as we speak)?

If there is sufficient evidence that there's a large part of
systems with just the time interfaces broken, we can make a command
line option to suppress just their use. The argument I've been
hearing though behind avoiding these runtime service functions is
that they're broken mainly because Windows doesn't use them, and
hence them being broken doesn't get noticed when certifying these
systems. With this, we'd instead need to disable all runtime
services Windows doesn't use, which as an input requires us to know
which ones these are. As a result the possible command line option
wouldn't be "no-time" but "like-windows", which makes me shudder.

Of course the suggested Kconfig option to "enable common
workarounds" could then enable just this option rather than "no-rs",
if deemed more useful this way.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.