[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 4 Mar 2021 11:05:44 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2fsWI6E6O7DgltfOR5/kTLyqGBQLV+JN0brcn3DTcow=; b=AP4ThztIM9RkEvHIgzfE3RC898SxyY4UQnK2QuegoFKNsf7zh/CPYa6qrRUGxifry5d+u2+omZRBZaZptEqq6lcBsoa79A2K1D378kHXWX/xBKnZ5ogIOuTfhGrVpKhtZtR34f5z1lfj/c9C3d+yZ0LKO5b5zv77joDIHAuBuy8U5E7hG1Kh5zom7jtVaR3jyP8LoSahAuU17lajrnSTR/cJZoYIgncQse1t23FMQkRMGvUBHG7to2ukHPkD2R5ZHlyTsGCBPqj4MUhCXJeutL3Turh1EBDeS5FU2xyPtA8H+519TNlc66fLgt92W14BJ1N1CYm38J0Cpjy7LE7YnA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DKcFrz+2w41voW3zxlX536tGiCTBM2mvjUCeLy8VCRZgA7zxw9G9GZiT/r/OAkx/pbbXSHIPkVGtaqzpr3RY947zAHjAGIP5XsqdQfqVzudmHR0XFFoPsmSpj7hoWTVzkffExXsORmrc6Wosx6ZhKSlXeyZZxmyq9c2B1ArlJikQtPw0I+vqjm+QHTAq/S61Sj9my9CQoZIvW9HxDJris5JNz6niEgHcCX7wqFTL8N6OtjTsmH4NFsCk+Wp9lOd6EexE2mh5H+m+gqVlAm/gQuVduCZD6AMbKlTS+M8Njl9mpHdvnNXAfpo78caQd0b2WcYveoo70GiV18AZ3a3qOw==
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Anthony PERARD" <anthony.perard@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Thu, 04 Mar 2021 10:06:30 +0000
  • Ironport-sdr: GGv6lxoxgfneJ55/vwWZpT08loLatW1spWwcI4PxfjWfo021kosvozYae5L4Gxa28IwIcUBs7M iK1PhCfyWgSzV6iuF0QTK7KAC3erF1uZpcPSzVi117khqdwj0XUkYZhOxCFyQBDhe9e6ZEFMx9 P5fJ9YR5ESj8fmKcewoUPkrW7FA8WYjKjHQGfiuykJjATKG0a2fFnmsIYvkQrXFpP3g0vSENdX xIOJoVewr7vJqDw4yYDQSrn+Eh/7beIYG0HSx92B3ZEQn2tkDMywzk4hI8DkQOm6SXBCoTBBvI sPA=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Mar 04, 2021 at 09:48:25AM +0100, Jan Beulich wrote:
> On 03.03.2021 16:38, Roger Pau Monné wrote:
> > On Tue, Mar 02, 2021 at 04:18:59PM +0100, Jan Beulich wrote:
> >> On 02.03.2021 16:00, Roger Pau Monné wrote:
> >>> On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote:
> >>>> On 01.03.2021 17:23, Roger Pau Monne wrote:
> >>>>> RFC because there's still some debate as to how we should solve the
> >>>>> MSR issue, this is one possible way, but IMO we need to make a
> >>>>> decision soon-ish because of the release timeline.
> >>>>
> >>>> Generally I think it would be far better from a user pov if
> >>>> things worked out of the box, at least in cases where it can be
> >>>> made work. Arguably for Solaris this isn't going to be possible
> >>>> (leaving aside the non-option of fully returning back to original
> >>>> behavior), but for the old-Linux-PV case I still think my proposed
> >>>> change is better in this regard. I realize Andrew said (on irc)
> >>>> that this is making the behavior even weaker. I take a different
> >>>> perspective: Considering a guest will install exception handlers
> >>>> at some point, I continue to fail to see what good it will do to
> >>>> try to inject a #GP(0) when we know the guest will die because of
> >>>> not having a handler registered just yet. It is a kernel flaw,
> >>>> yes, but they ended up living with it merely because of our odd
> >>>> prior behavior, so we can't put all the blame on them.
> >>>
> >>> My concern with this would mostly be with newer guests, in the sense
> >>> that people could come to rely on this behavior of not injecting a
> >>> #GP if the handler is not setup, which I think we should try to avoid.
> >>>
> >>> One option would be to introduce a feature that could be used in the
> >>> elfnotes to signal that the kernel doesn't require such workarounds
> >>> for MSR #GP handling, maybe:
> >>>
> >>> /*
> >>>  * Signal that the kernel doesn't require suppressing the #GP on
> >>>  * unknown MSR accesses if the handler is not setup. New kernels
> >>>  * should always have this set.
> >>>  */
> >>> #define XENFEAT_msr_early_gp   16
> >>>
> >>> We could try to backport this on the Linux kernel as far as we can
> >>> that we know it's safe to do so.
> >>
> >> I too did consider something like this. While I'm not opposed, it
> >> effectively requires well-behaved consumers who haven't been well-
> >> behaved in the past.
> >>
> >>> If this is not acceptable then I guess your solution is fine. Arguably
> >>> PV has it's own (weird) architecture, in which it might be safe to say
> >>> that no #GP will be injected as a result of a MSR access unless the
> >>> handler is setup. Ideally we should document this behaviour somewhere.
> >>>
> >>> Maybe we could add a rdmsr_safe to your path akin to what's done
> >>> here?
> >>
> >> Probably. Would need trying out on the affected older version,
> >> just like ...
> >>
> >>>>> --- a/docs/man/xl.cfg.5.pod.in
> >>>>> +++ b/docs/man/xl.cfg.5.pod.in
> >>>>> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled.
> >>>>>  
> >>>>>  =back
> >>>>>  
> >>>>> +=item B<msr_legacy_handling=BOOLEAN>
> >>>>> +
> >>>>> +Select whether to use the legacy behaviour for accesses to MSRs not 
> >>>>> explicitly
> >>>>> +handled by Xen instead of injecting a #GP to the guest.  Such legacy 
> >>>>> access
> >>>>> +mode will force Xen to replicate the behaviour from the hardware it's 
> >>>>> currently
> >>>>> +running on in order to decide whether a #GP is injected to the guest.  
> >>>>> Note
> >>>>> +that the guest is never allowed access to unhandled MSRs, this option 
> >>>>> only
> >>>>> +changes whether a #GP might be injected or not.
> >>>>
> >>>> This description is appropriate for reads, but not for writes:
> >>>> Whether a write would fault can only be known by trying a write,
> >>>> yet for safety reasons we can't risk doing more than a read. An
> >>>> option might be to make writes fault is the to be written value
> >>>> differs from that which the probing read has returned (i.e. the
> >>>> same condition which originally had caused a log message to appear
> >>>> in 4.14 for PV guests).
> >>>
> >>> Read values for unhandled MSRs will always be 0 with the proposed
> >>> code, so we would inject a #GP to the guest for any write attempt to
> >>> unhandled MSRs with a value different than 0.
> >>>
> >>> Maybe we should just inject a #GP to the guest for any attempts to
> >>> write to unhandled MSRs?
> >>
> >> ... doing this would need to. Iirc I did add the write side of the
> >> handling in my patch just for symmetry. I'd prefer handling to be
> >> symmetric, but I can see why we may not want it to be so.
> > 
> > Kind of in the wrong context, but I will reply here because it's the
> > last message related to the MSR stuff.
> > 
> > Jan: would you be fine with modifying your path to not change the
> > behaviour for writes (ie: always inject #GP to the guest for unhandled
> > accesses), and then add a rdmsr_safe to the read path in order to
> > decide whether to inject a #GP to the guest even if the #GP handler is
> > not setup?
> 
> I don't mind as long as it ends up making things work (I have no
> reason to believe it won't). I'll give that a try.

Thanks. I think this behavior would be a fine solution for PV guests
because:

 - It's unlikely we will see any new ports to x86 PV mode, and hence
   we mostly need to care about the existing ones which we already
   fixed in new versions, so this behavior is not to be propagated.
 - The Xen PV architecture is already different from the x86
   architecture, and hence we can slightly bend the rules to our
   liking.
 - It's not feasible for us to figure out what MSRs older Linux
   versions tried to access without having a #GP handler setup. Not
   sure whether this also affects NetBSD.

I would like to suggest to introduce a comment to document this
behaviour for x86 PV in the public headers, but I'm afraid I cannot
find a suitable location.

> > I can modify the option introduced on this patch to always inject #GP
> > on unhandled writes and only inject #GP on reads if the physical MSR
> > access on the hardware also triggers a #GP. HVM guests with broken
> > behavior will require having the option enabled in order to work,
> > but I think that's likely OK as long term we expect all HVM guests to
> > be well behaved.
> > 
> > My main worry with this approach is that we end up requiring half of
> > the common HVM guests OSes to have the 'legacy MSR handling' option
> > enabled in order to work. I think it's unlikely for this to happen, as
> > we are only aware of Solaris requiring it at the moment.
> > 
> > It also raises the question whether we will allow any more exceptions
> > to the MSR policy, like we did for Windows and OpenBSD in:
> > 
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ca88a43e660c75796656a544e54a648c60d26ef0
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4175fd3ccd17face664036fa98e9329aa317f7b6
> > 
> > Or if we are just going to require those guests to enable the legacy
> > MSR handling instead.
> 
> It is my understanding that Andrew's view is that adding such special
> cases is the only acceptable thing. As voiced before I don't share
> this view, as it means we're going to be entirely reactive to people
> reporting issues, when I think we should be pro-active as far as is
> reasonable. Independent of any pro-active measures there'll still be
> enough reasons for reactive changes - for example I assume Linux
> would now issue the log message from
> 
>       if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
> 
>               if (c->x86 > 0x10 ||
>                   (c->x86 == 0x10 && c->x86_model >= 0x2)) {
>                       u64 val;
> 
>                       rdmsrl(MSR_K7_HWCR, val);
>                       if (!(val & BIT(24)))
>                               pr_warn(FW_BUG "TSC doesn't count with P0 
> frequency!\n");
>               }
>       }
> 
> since we surface a zero value right now (but I didn't verify this in
> practice yet).

I think we inject a #GP to the guest if it tries to access
MSR_K7_HWCR? As I don't see this MSR handled explicitly in
svm_msr_read_intercept. So Linux would complain from the unchecked MSR
access and the MSR value not having the bit set.

This one seems like a fine candidate to implement in
svm_msr_read_intercept, because Xen needs to return a specific value
for this MSR.

Regarding the global approach to fixing the fallout from the MSR
policy change, I don't see why we couldn't do a mix between pro-active
and reactive.

I think we must have an option to fallback to something similar to the
old policy for HVM guests so that users have a way to get their guests
running after an update without requiring a code change.

That doesn't mean we can't reactively add the missing MSRs as we go
discovering them. I would even print a warning when using such
fallback legacy MSR handling option that you need to report the issue
to xen-devel because the option might be removed in future releases.

Does the above seem like a sensible plan?

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.