Xen project Mailing List

Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 2 Mar 2021 12:16:12 +0100

Cc: Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 02 Mar 2021 11:16:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.03.2021 17:23, Roger Pau Monne wrote: > Introduce an option to allow selecting the legacy behavior for > accesses to MSRs not explicitly handled. Since commit > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > handled by Xen result in the injection of a #GP to the guest. This is > a behavior change since previously a #GP was only injected if > accessing the MSR on the real hardware will also trigger a #GP. > > This seems to be problematic for some guests, so introduce an option > to fallback to this legacy behavior. The main difference between what > was previously done is that the hardware MSR value is not leaked to > the guests on reads. Looking at the WRMSR behavior for PV, what you introduce isn't matching 4.14 behavior: If rdmsr_safe() failed, all that effected was the issuing of a log message. The behavior you propose is better, no question, but it shouldn't be described as matching legacy behavior then. Somewhat related to this I wonder whether MSR reads and writes wouldn't better be controllable independently. It seems quite likely that a kernel may have an issue only with reads. Additionally I wonder whether it is a good idea to let these events go silently. > Note that this option is not made available to dom0. I'm not sure > whether it makes sense to do so, since anyone updating Xen to such > newer version will also likely pair it with a newish kernel that > doesn't require such workarounds. No, that's not an option imo. It was a Dom0 where I ran into the problem causing me to submit "x86/PV: conditionally avoid raising #GP for early guest MSR accesses". While I could upgrade that system, I have reasons for not doing so. And while I could put a more modern kernel on there, I'd prefer if the distro kernel continued to work. (That's leaving aside that for unrelated reasons building and using my own, newer kernel there is quite a bit more difficult than on all other of my test systems.) > RFC because there's still some debate as to how we should solve the > MSR issue, this is one possible way, but IMO we need to make a > decision soon-ish because of the release timeline. Generally I think it would be far better from a user pov if things worked out of the box, at least in cases where it can be made work. Arguably for Solaris this isn't going to be possible (leaving aside the non-option of fully returning back to original behavior), but for the old-Linux-PV case I still think my proposed change is better in this regard. I realize Andrew said (on irc) that this is making the behavior even weaker. I take a different perspective: Considering a guest will install exception handlers at some point, I continue to fail to see what good it will do to try to inject a #GP(0) when we know the guest will die because of not having a handler registered just yet. It is a kernel flaw, yes, but they ended up living with it merely because of our odd prior behavior, so we can't put all the blame on them. This isn't to say that I'm against propagating exceptions where there's no alternative to delivering one. Also I'm certainly open to further tighten the condition of when to zap such an exception (e.g. only as long as there's no handler _and_ the guest is still in UP mode, assuming of course this would still address the observed issue). > --- a/docs/man/xl.cfg.5.pod.in > +++ b/docs/man/xl.cfg.5.pod.in > @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > > =back > > +=item B<msr_legacy_handling=BOOLEAN> > + > +Select whether to use the legacy behaviour for accesses to MSRs not > explicitly > +handled by Xen instead of injecting a #GP to the guest. Such legacy access > +mode will force Xen to replicate the behaviour from the hardware it's > currently > +running on in order to decide whether a #GP is injected to the guest. Note > +that the guest is never allowed access to unhandled MSRs, this option only > +changes whether a #GP might be injected or not. This description is appropriate for reads, but not for writes: Whether a write would fault can only be known by trying a write, yet for safety reasons we can't risk doing more than a read. An option might be to make writes fault is the to be written value differs from that which the probing read has returned (i.e. the same condition which originally had caused a log message to appear in 4.14 for PV guests). > --- a/tools/libs/light/libxl_types.idl > +++ b/tools/libs/light/libxl_types.idl > @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), > ("vuart", libxl_vuart_type), > ])), > + ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool), > + ])), Seeing this I'm wondering whether the entire set of arch_* shouldn't be within a union. But afaics this would have further implications on code elsewhere, so surely wouldn't want doing right now. > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d, > > domain_cpu_policy_changed(d); > > + d->arch.msr_legacy_handling = config->arch.domain_flags & > + XEN_X86_LEGACY_MSR_HANDLING; Somewhere you'd also need to refuse processing requests with any of the other 31 bits set. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.