Xen project Mailing List

Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 2 Mar 2021 16:00:21 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f6GGem3553ZQJ2A8xBoubF1bosk6d1m7VHnE/mfJrcg=; b=KLUAE7xrEk8TKjWtIrhQFcm4f5PCa3O354aYw7eAkAJ3enyPGOCOSN5/wH02vKN5YKD6IB7VUeHLENIbaLYC05Cqbvvv5EPvGSCBx4wkvs+6tQBs0xWNRILX7xKuNMh9miW4VUc4rpyjLkNnq3DuWmoa+hHnPL0iGZHU2LjkfDJXNFhonpEB3p3/UeTDRaMX5ujB8QVMHlCKetC49yLhoPMdGvOjty1X6uaEN6vLJy6Y1LMPyTgiNHdH2AZJ/RmN9U14RU2YGkE2iTMx/LSzvbrOeyGn8VcqcP6MxYO3CuWG+Rt1bsxqO0Qy1Xc2nI6TTBGVEjQh0TyRtipW+/3/ng==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VapDLthl+lDC0N7JsGjrjS8kDP7W4qFR3Jg1ZNCg5Da8GCGYSt6dvxCv47CmWniYaoj7TsY4pJkWneqczkwMTEHhusXjOjhkpsb0dQzj6XwwdMXc5zYdo3/uPAuk+NA3EuL5I5UiPc9MaVuwCghf6AzyMhWGRkOkSrop2gV2ubZ50WTt27ZBPlo2iGTXkhtBbIvKH/SIIKGB3ndZxxcot3KlpCmFyBuf2Ldg547omP31qWON7VC8WbDNHC6yQQvJjkLCIsryz5FoSCG7K4bPb0M+yltBbbYV7/tx1DLLkBTLRCaouCUsR9fe/dYiiHP1ll6G7w1lT/rW3SAyY1y38A==

Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 02 Mar 2021 15:02:08 +0000

Ironport-sdr: 9PEQjBZwdAHsWsKM1Uk7PhPKBs0L3HLvLbelO+EDykXZ2z+yLBeE1AxVongbTGAa7WXFPFmEJP DWbfgCMohd/W4RacT34cCis0rch97U+mm8xM0B6MtBVAoprLRO4mgYeqWXkNxs9IWEKUi7EF1G BVdfNlNqM5EDQByWDR6x8jOjNMCgRzRW3HAjvDq5Ll7t/nIUICBpxA2ETwm4Z4Jfwp1vz+Xy16 sjMp/1UwS7A8k0Fyo5TqnYB5fTWkCNvgFlvbGWVqcFZiyIwQCa05ey8aa76fHlFMI1i/f9C83Z 0Zk=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: > On 01.03.2021 17:23, Roger Pau Monne wrote: > > Introduce an option to allow selecting the legacy behavior for > > accesses to MSRs not explicitly handled. Since commit > > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > > handled by Xen result in the injection of a #GP to the guest. This is > > a behavior change since previously a #GP was only injected if > > accessing the MSR on the real hardware will also trigger a #GP. > > > > This seems to be problematic for some guests, so introduce an option > > to fallback to this legacy behavior. The main difference between what > > was previously done is that the hardware MSR value is not leaked to > > the guests on reads. > > Looking at the WRMSR behavior for PV, what you introduce isn't > matching 4.14 behavior: If rdmsr_safe() failed, all that effected > was the issuing of a log message. The behavior you propose is > better, no question, but it shouldn't be described as matching > legacy behavior then. > > Somewhat related to this I wonder whether MSR reads and writes > wouldn't better be controllable independently. It seems quite > likely that a kernel may have an issue only with reads. > > Additionally I wonder whether it is a good idea to let these > events go silently. > > > Note that this option is not made available to dom0. I'm not sure > > whether it makes sense to do so, since anyone updating Xen to such > > newer version will also likely pair it with a newish kernel that > > doesn't require such workarounds. > > No, that's not an option imo. It was a Dom0 where I ran into the > problem causing me to submit "x86/PV: conditionally avoid raising > #GP for early guest MSR accesses". While I could upgrade that > system, I have reasons for not doing so. And while I could put a > more modern kernel on there, I'd prefer if the distro kernel > continued to work. (That's leaving aside that for unrelated > reasons building and using my own, newer kernel there is quite a > bit more difficult than on all other of my test systems.) > > > RFC because there's still some debate as to how we should solve the > > MSR issue, this is one possible way, but IMO we need to make a > > decision soon-ish because of the release timeline. > > Generally I think it would be far better from a user pov if > things worked out of the box, at least in cases where it can be > made work. Arguably for Solaris this isn't going to be possible > (leaving aside the non-option of fully returning back to original > behavior), but for the old-Linux-PV case I still think my proposed > change is better in this regard. I realize Andrew said (on irc) > that this is making the behavior even weaker. I take a different > perspective: Considering a guest will install exception handlers > at some point, I continue to fail to see what good it will do to > try to inject a #GP(0) when we know the guest will die because of > not having a handler registered just yet. It is a kernel flaw, > yes, but they ended up living with it merely because of our odd > prior behavior, so we can't put all the blame on them. My concern with this would mostly be with newer guests, in the sense that people could come to rely on this behavior of not injecting a #GP if the handler is not setup, which I think we should try to avoid. One option would be to introduce a feature that could be used in the elfnotes to signal that the kernel doesn't require such workarounds for MSR #GP handling, maybe: /* * Signal that the kernel doesn't require suppressing the #GP on * unknown MSR accesses if the handler is not setup. New kernels * should always have this set. */ #define XENFEAT_msr_early_gp 16 We could try to backport this on the Linux kernel as far as we can that we know it's safe to do so. If this is not acceptable then I guess your solution is fine. Arguably PV has it's own (weird) architecture, in which it might be safe to say that no #GP will be injected as a result of a MSR access unless the handler is setup. Ideally we should document this behaviour somewhere. Maybe we could add a rdmsr_safe to your path akin to what's done here? > > This isn't to say that I'm against propagating exceptions where > there's no alternative to delivering one. Also I'm certainly open > to further tighten the condition of when to zap such an exception > (e.g. only as long as there's no handler _and_ the guest is still > in UP mode, assuming of course this would still address the > observed issue). > > > --- a/docs/man/xl.cfg.5.pod.in > > +++ b/docs/man/xl.cfg.5.pod.in > > @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > > > > =back > > > > +=item B<msr_legacy_handling=BOOLEAN> > > + > > +Select whether to use the legacy behaviour for accesses to MSRs not > > explicitly > > +handled by Xen instead of injecting a #GP to the guest. Such legacy access > > +mode will force Xen to replicate the behaviour from the hardware it's > > currently > > +running on in order to decide whether a #GP is injected to the guest. Note > > +that the guest is never allowed access to unhandled MSRs, this option only > > +changes whether a #GP might be injected or not. > > This description is appropriate for reads, but not for writes: > Whether a write would fault can only be known by trying a write, > yet for safety reasons we can't risk doing more than a read. An > option might be to make writes fault is the to be written value > differs from that which the probing read has returned (i.e. the > same condition which originally had caused a log message to appear > in 4.14 for PV guests). Read values for unhandled MSRs will always be 0 with the proposed code, so we would inject a #GP to the guest for any write attempt to unhandled MSRs with a value different than 0. Maybe we should just inject a #GP to the guest for any attempts to write to unhandled MSRs? > > --- a/tools/libs/light/libxl_types.idl > > +++ b/tools/libs/light/libxl_types.idl > > @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ > > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), > > ("vuart", libxl_vuart_type), > > ])), > > + ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool), > > + ])), > > Seeing this I'm wondering whether the entire set of arch_* > shouldn't be within a union. But afaics this would have further > implications on code elsewhere, so surely wouldn't want doing > right now. Right, I thought the same but I'm not sure we can change this now, as it's part of the API. Adding new fields is fine, but adding a union field with arch_arm would change the structure as we would need to add a 'u' (or equivalent) field here. > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d, > > > > domain_cpu_policy_changed(d); > > > > + d->arch.msr_legacy_handling = config->arch.domain_flags & > > + XEN_X86_LEGACY_MSR_HANDLING; > > Somewhere you'd also need to refuse processing requests with any > of the other 31 bits set. Yes, I need to sanitize the flags. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.