 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17 v4 2/2] amd: remove VIRT_SC_MSR_HVM synthetic feature
 On Tue, Nov 15, 2022 at 04:21:07PM +0000, Andrew Cooper wrote:
> On 15/11/2022 13:26, Roger Pau Monne wrote:
> > Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched
> > on vm{entry,exit} there's no need to use a synthetic feature bit for
> > it anymore.
> >
> > Remove the bit and instead use a global variable.
> >
> > No functional change intended.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> > Release-acked-by: Henry Wang <Henry.Wang@xxxxxxx>
> 
> This is definitely not appropriate for 4.17, but it's a performance
> regression in general, hence my firm and repeated objection to this
> style of patch.
While I don't have any objections in deferring this past 4.17, none of
the modified paths are performance sensitive AFAICT.
> General synthetic bits have existed for several decades longer than
> alternatives.  It has never ever been a rule, or even a recommendation,
> to aggressively purge the non-alternative bits, because it's a provably
> bad thing to do.
> 
> 
> You are attempting a micro-optimisation, that won't produce any
> improvement at all in centuries, while...
Oh, I wasn't attempting any micro-optimizations TBH, just didn't see
the need to keep this as a synthetic feature, and generally consider
better to use a global variable because it's IMO easier to follow.
> > diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> > index a332087604..9e3b9094d3 100644
> > --- a/xen/arch/x86/cpu/amd.c
> > +++ b/xen/arch/x86/cpu/amd.c
> > @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe);
> >  /* Signal whether the ACPI C1E quirk is required. */
> >  bool __read_mostly amd_acpi_c1e_quirk;
> >  bool __ro_after_init amd_legacy_ssbd;
> > +bool __ro_after_init amd_virt_spec_ctrl;
> 
> ... actually expending .rodata with something 8 times less efficiently
> packed, and ...
> 
> >  
> >  static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
> >                              unsigned int *hi)
> > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> > index 822f9ace10..acc2f606ce 100644
> > --- a/xen/arch/x86/cpuid.c
> > +++ b/xen/arch/x86/cpuid.c
> > @@ -3,6 +3,7 @@
> >  #include <xen/param.h>
> >  #include <xen/sched.h>
> >  #include <xen/nospec.h>
> > +#include <asm/amd.h>
> 
> ... (Specific to this instance) making life harder for the people trying
> to make CONFIG_AMD work, and ...
That's indeed a point, albeit I think adding a `#define
amd_virt_spec_ctrl false` won't be the bigger of the problems when
dealing with CONFIG_AMD, and will need to be done for other AMD
specific variables anyway.
> > diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
> > index 4e53056624..0b94af6b86 100644
> > --- a/xen/arch/x86/spec_ctrl.c
> > +++ b/xen/arch/x86/spec_ctrl.c
> > @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk 
> > thunk, uint64_t caps)
> >             (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
> >              boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
> >              boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
> > -            boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) ||
> > +            amd_virt_spec_ctrl ||
> 
> ... breaking apart a single TEST instruction, which not only adds an
> extra conditional merge, but now hits an cold-ish cache line everywhere
> it's used.
Why does performance matter here?  It's an init function that prints
the speculation related settings to the screen, so that's likely to be
many times slower that accessing a cold cache line.
> Count how many synthetic feature bits it will actually take to change
> the per-cpu data size, and realise that, when it will take more than 200
> years at the current rate of accumulation, any believe that this is an
> improvement to be had disappears.
> 
> Yes, it is only a micro regression, but you need a far better
> justification than "there is a gain of 64 bytes per CPU which will be
> non-theoretical in more than 200 years" when traded off vs the actual
> 512 bytes, plus extra code bloat bloat, plus reduced locality of data
> that this "improvement" genuinely costs today.
I wasn't considering any of the above when proposing the change, my
only motivation was that global variables are clearer to use than
synthetic features, and I didn't see a need for a synthetic feature in
this case.  If we agree the above possible performance regressions
are worth it I'm fine keeping it as-is.
Now that I realize it amd_virt_spec_ctrl could even be plain __init.
Thanks, Roger.
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |