[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 15 Oct 2020 15:34:12 +0200
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Thu, 15 Oct 2020 13:34:32 +0000
  • Ironport-sdr: qvHeRrlmREgvUlot+GowN8hNqsH6Ki4z/O5GqfAA3FgsEH4DMHCqJkHJsLuMloBCQhMNARA42o goMyLvR/gVbxjfVlDUUTZXLNj78G4RzChZUfTPMLjncfs7vbOpODJEd43O5v2a6Tt0mNAgy0Rc /LIRkB5luERpY1JKi/w9ZnsIQYffH6DBDpGPIloQnAVxTRYSBIjHp42bdWhmQZpkl2vjwkmN8h Tbje+A8Oid2OnifEVkJXzHKYfJurdpwhKKXqzfDB6i77rAhqjJJxr5lMQsvo9LIrCNajPfkkUu koY=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Oct 07, 2020 at 06:41:17PM +0200, Roger Pau Monné wrote:
> On Wed, Oct 07, 2020 at 01:06:08PM +0100, Andrew Cooper wrote:
> > On 06/10/2020 17:23, Roger Pau Monne wrote:
> > > Currently a PV hardware domain can also be given control over the CPU
> > > frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
> > 
> > This might be how the current logic "works", but its straight up broken.
> > 
> > PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
> > vcpu for every pcpu, it cannot use the interface correctly.
> 
> Selecting cpufreq=dom0-kernel will force vCPU pinning. I'm not able
> however to see anywhere that would force dom0 vCPUs == pCPUs.
> 
> > > However since commit 322ec7c89f6 the default behavior has been changed
> > > to reject accesses to not explicitly handled MSRs, preventing PV
> > > guests that manage CPU frequency from reading
> > > MSR_IA32_PERF_{STATUS/CTL}.
> > >
> > > Additionally some HVM guests (Windows at least) will attempt to read
> > > MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
> > >
> > > vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> > > d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
> > >
> > > Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> > > handling shared between HVM and PV guests, and add an explicit case
> > > for reads to MSR_IA32_PERF_{STATUS/CTL}.
> > 
> > OTOH, PERF_CTL does have a seemingly architectural "please disable turbo
> > for me" bit, which is supposed to be for calibration loops.  I wonder if
> > anyone uses this, and whether we ought to honour it (probably not).
> 
> If we let guests play with this we would have to save/restore the
> guest value on context switch. Unless there's a strong case for this,
> I would say no.
> 
> > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > > index d8ed83f869..41baa3b7a1 100644
> > > --- a/xen/include/xen/sched.h
> > > +++ b/xen/include/xen/sched.h
> > > @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
> > >      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
> > >  } cpufreq_controller;
> > >  
> > > +static inline bool is_cpufreq_controller(const struct domain *d)
> > > +{
> > > +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
> > > +            is_hardware_domain(d));
> > 
> > This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ
> 
> It does seem to build on Arm, because this is only used in x86 code:
> 
> https://gitlab.com/xen-project/people/royger/xen/-/jobs/778207412
> 
> The extern declaration of cpufreq_controller is just above, so if you
> tried to use is_cpufreq_controller on Arm you would get a link time
> error, otherwise it builds fine. The compiler removes the function on
> Arm as it has the inline attribute and it's not used.
> 
> Alternatively I could look into moving cpufreq_controller (and
> is_cpufreq_controller) out of sched.h into somewhere else, I haven't
> looked at why it needs to live there.
> 
> > Honestly - I don't see any point to this code.  Its opt-in via the
> > command line only, and doesn't provide adequate checks for enablement. 
> > (It's not as if we're lacking complexity or moving parts when it comes
> > to power/frequency management).
> 
> Right, I could do a pre-patch to remove this, but I also don't think
> we should block this fix on removing FREQCTL_dom0_kernel, so I would
> rather fix the regression and then remove the feature if we agree it
> can be removed.

Can we get some consensus on what to do next?

I think I've provided replies to all the points above, and I'm not sure
what do to next in order to proceed with this patch.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.