[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection



On 01.03.2021 17:23, Roger Pau Monne wrote:
> Introduce an option to allow selecting the legacy behavior for
> accesses to MSRs not explicitly handled. Since commit
> 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly
> handled by Xen result in the injection of a #GP to the guest. This is
> a behavior change since previously a #GP was only injected if
> accessing the MSR on the real hardware will also trigger a #GP.
> 
> This seems to be problematic for some guests, so introduce an option
> to fallback to this legacy behavior. The main difference between what
> was previously done is that the hardware MSR value is not leaked to
> the guests on reads.

Looking at the WRMSR behavior for PV, what you introduce isn't
matching 4.14 behavior: If rdmsr_safe() failed, all that effected
was the issuing of a log message. The behavior you propose is
better, no question, but it shouldn't be described as matching
legacy behavior then.

Somewhat related to this I wonder whether MSR reads and writes
wouldn't better be controllable independently. It seems quite
likely that a kernel may have an issue only with reads.

Additionally I wonder whether it is a good idea to let these
events go silently.

> Note that this option is not made available to dom0. I'm not sure
> whether it makes sense to do so, since anyone updating Xen to such
> newer version will also likely pair it with a newish kernel that
> doesn't require such workarounds.

No, that's not an option imo. It was a Dom0 where I ran into the
problem causing me to submit "x86/PV: conditionally avoid raising
#GP for early guest MSR accesses". While I could upgrade that
system, I have reasons for not doing so. And while I could put a
more modern kernel on there, I'd prefer if the distro kernel
continued to work. (That's leaving aside that for unrelated
reasons building and using my own, newer kernel there is quite a
bit more difficult than on all other of my test systems.)

> RFC because there's still some debate as to how we should solve the
> MSR issue, this is one possible way, but IMO we need to make a
> decision soon-ish because of the release timeline.

Generally I think it would be far better from a user pov if
things worked out of the box, at least in cases where it can be
made work. Arguably for Solaris this isn't going to be possible
(leaving aside the non-option of fully returning back to original
behavior), but for the old-Linux-PV case I still think my proposed
change is better in this regard. I realize Andrew said (on irc)
that this is making the behavior even weaker. I take a different
perspective: Considering a guest will install exception handlers
at some point, I continue to fail to see what good it will do to
try to inject a #GP(0) when we know the guest will die because of
not having a handler registered just yet. It is a kernel flaw,
yes, but they ended up living with it merely because of our odd
prior behavior, so we can't put all the blame on them.

This isn't to say that I'm against propagating exceptions where
there's no alternative to delivering one. Also I'm certainly open
to further tighten the condition of when to zap such an exception
(e.g. only as long as there's no handler _and_ the guest is still
in UP mode, assuming of course this would still address the
observed issue).

> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled.
>  
>  =back
>  
> +=item B<msr_legacy_handling=BOOLEAN>
> +
> +Select whether to use the legacy behaviour for accesses to MSRs not 
> explicitly
> +handled by Xen instead of injecting a #GP to the guest.  Such legacy access
> +mode will force Xen to replicate the behaviour from the hardware it's 
> currently
> +running on in order to decide whether a #GP is injected to the guest.  Note
> +that the guest is never allowed access to unhandled MSRs, this option only
> +changes whether a #GP might be injected or not.

This description is appropriate for reads, but not for writes:
Whether a write would fault can only be known by trying a write,
yet for safety reasons we can't risk doing more than a read. An
option might be to make writes fault is the to be written value
differs from that which the probing read has returned (i.e. the
same condition which originally had caused a log message to appear
in 4.14 for PV guests).

> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
>                                 ("vuart", libxl_vuart_type),
>                                ])),
> +    ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool),
> +                              ])),

Seeing this I'm wondering whether the entire set of arch_*
shouldn't be within a union. But afaics this would have further
implications on code elsewhere, so surely wouldn't want doing
right now.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d,
>  
>      domain_cpu_policy_changed(d);
>  
> +    d->arch.msr_legacy_handling = config->arch.domain_flags &
> +                                  XEN_X86_LEGACY_MSR_HANDLING;

Somewhere you'd also need to refuse processing requests with any
of the other 31 bits set.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.