[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 10/12] x86/HVM: scale MPERF values reported to guests (on AMD)



On 05/05/2020 09:18, Jan Beulich wrote:
> AMD's PM specifies that MPERF (and its r/o counterpart) reads are
> affected by the TSC ratio. Hence when processing such reads in software
> we too should scale the values. While we don't currently (yet) expose
> the underlying feature flags, besides us allowing the MSRs to be read
> nevertheless, RDPRU is going to expose the values even to user space.
>
> Furthermore, due to the not exposed feature flags, this change has the
> effect of making properly inaccessible (for reads) the two MSRs.
>
> Note that writes to MPERF (and APERF) continue to be unsupported.

Linux is now using MPERF/APERF for its frequency-invariant scheduling
logic.  Irritatingly, via its read/write alias rather than its read-only
alias.  Even more irritatingly, Intel's reference algorithm recommends
writing to both, despite this being being far less efficient than (one
of) AMD's (two) algorithm(s) which tells you just to subtract the values
you last sampled.

On the one hand, I'm tempted to suggest that we offer EFRO on Intel and
update Linux to use it.  OTOH, that would VMExit as Intel CPUs don't
understand the EFRO interface.

I can't see any sane way to virtualise the write behaviour for MPERF/APERF.

>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> v3: New.
> ---
> I did consider whether to put the code in guest_rdmsr() instead, but
> decided that it's better to have it next to TSC handling.

Please do put it in guest_rdmsr().  This is code hygene just as much as
bool_t or style fixes are.

The relationship to TSC is passing-at-best.

>
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3478,6 +3478,22 @@ int hvm_msr_read_intercept(unsigned int
>          *msr_content = v->arch.hvm.msr_tsc_adjust;
>          break;
>  
> +    case MSR_MPERF_RD_ONLY:
> +        if ( !d->arch.cpuid->extd.efro )
> +        {
> +            goto gp_fault;
> +
> +    case MSR_IA32_MPERF:
> +            if ( !(d->arch.cpuid->basic.raw[6].c &
> +                   CPUID6_ECX_APERFMPERF_CAPABILITY) )
> +                goto gp_fault;
> +        }
> +        if ( rdmsr_safe(msr, *msr_content) )
> +            goto gp_fault;
> +        if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) 
> )

I suspect we want to gain amd_like() outside of the emulator.

> +            *msr_content = hvm_get_guest_tsc_fixed(v, *msr_content);
> +        break;
> +
>      case MSR_APIC_BASE:
>          *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
>          break;
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -405,6 +405,9 @@
>  #define MSR_IA32_MPERF                       0x000000e7
>  #define MSR_IA32_APERF                       0x000000e8
>  
> +#define MSR_MPERF_RD_ONLY            0xc00000e7
> +#define MSR_APERF_RD_ONLY            0xc00000e8

S/RD_ONLY/RO/ ?  No loss of meaning.  Also, above the legacy line please.

~Andrew

> +
>  #define MSR_IA32_THERM_CONTROL               0x0000019a
>  #define MSR_IA32_THERM_INTERRUPT     0x0000019b
>  #define MSR_IA32_THERM_STATUS                0x0000019c
>




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.