[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] x86: improve MSR_SHADOW_GS accesses



On 06/12/17 16:37, Jan Beulich wrote:
> --- a/xen/include/asm-x86/msr.h
> +++ b/xen/include/asm-x86/msr.h
> @@ -8,6 +8,7 @@
>  #include <xen/types.h>
>  #include <xen/percpu.h>
>  #include <xen/errno.h>
> +#include <asm/alternative.h>
>  #include <asm/asm_defns.h>
>  #include <asm/cpufeature.h>
>  
> @@ -172,6 +173,24 @@ static inline unsigned long rdgsbase(voi
>      return base;
>  }
>  
> +static inline unsigned long rdgsshadow(void)
> +{
> +    unsigned long base;
> +
> +    alternative_io("mov %[msr], %%ecx\n\t"
> +                   "rdmsr\n\t"
> +                   "shl $32, %%rdx\n\t"
> +                   "or %%rdx, %[res]",

There needs to be some clearer distinction between the two
alternatives.  It took a while for me to spot this comma.

> +                   "swapgs\n\t"
> +                   ".byte 0xf3, 0x48, 0x0f, 0xae, 0xc8\n\t" /* rdgsbase rax 
> */
> +                   "swapgs",
> +                   X86_FEATURE_FSGSBASE,
> +                   [res] "=&a" (base),
> +                   [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx");

MSR should be a "c" constraint, and the clobber dropped.  It will result
in better code in most of the callsites, by avoiding a reload of ecx,
and merging of the higher constant with the other MSR accesses.

I'm not entirely sure the alternative is justified here.  For
consistency alone, these helpers should match their companions, and in
the unlikely case that the runtime feature test does make a measurable
difference, wouldn't a static key be a better option anyway?

> +
> +    return base;
> +}
> +
>  static inline void wrfsbase(unsigned long base)
>  {
>      if ( cpu_has_fsgsbase )
> @@ -196,6 +215,19 @@ static inline void wrgsbase(unsigned lon
>          wrmsrl(MSR_GS_BASE, base);
>  }
>  
> +static inline void wrgsshadow(unsigned long base)
> +{
> +    alternative_input("mov %[msr], %%ecx\n\t"
> +                      "shld $32, %[val], %%rdx\n\t"

This is a vector path instruction and specifically called out to be
avoided in the AMD optimisation guide.  On all hardware (according to
Agner's latency mesurements) it alone has a longer latency to execute
that the mov/shl pair you've replaced, and that is before accounting for
mov elimination.

~Andrew

> +                      "wrmsr",
> +                      "swapgs\n\t"
> +                      ".byte 0xf3, 0x48, 0x0f, 0xae, 0xd8\n\t" /* wrgsbase 
> rax */
> +                      "swapgs",
> +                      X86_FEATURE_FSGSBASE,
> +                      [val] "a" (base),
> +                      [msr] "i" (MSR_SHADOW_GS_BASE) : "rcx", "rdx");
> +}
> +
>  DECLARE_PER_CPU(u64, efer);
>  u64 read_efer(void);
>  void write_efer(u64 val);
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.