[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

On 23/02/18 15:05, Jan Beulich wrote:
>>>> On 21.02.18 at 12:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>> There are many problems with MSR_TSC_AUX handling.
>> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
>> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
>> depends on RDTSCP.
> Are you sure this isn't a documentation mistake?

No, but nor do I have reason to doubt what is written.  It seems
plausible and reasonable to me.

RDTSCP has been in hardware for many generations now (and appears to be
architectural in AMD64 although not picked up by Intel initially), while
RDPID is new in Skylake.

The documentation is consistent between Vol 2 (CPUID, RDPID) and Vol 4
(MSRs), none of which make any link between RDPID enumeration and
TSC_AUX, but plenty of links between RDTSCP and TSC_AUX.

>  If it indeed isn't, of
> course my earlier comments regarding the use of cpu_has_rdpid
> would have been wrong (and that patch would be fine without the
> requested adjustment).
>> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
>> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP 
>> for
>> PV guests, which in turn allows RDPID to work.
>> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
>> into the generic MSR policy infrastructure, and becomes common.  One
>> improvement is that we will now reject invalid values, rather than silently
>> truncating and accepting them.  This also causes the emulator to reject 
>> for guests without the features.
>> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
>> MSR_TSC_AUX and the reported value is actually the migration incarnation.  
>> The
>> previous behaviour of this mode was to silently drop writes, but as it is a
>> break in the x86 ABI to start with, switch the semantics to be more sane, and
>> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> All of this obviously wants an ack and/or testing by the Oracle folks
> (assuming this is still in use somewhere).
>> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
>> hvm_domain_context_t *h)
>>      if ( hvm_funcs.tsc_scaling.setup )
>>          hvm_funcs.tsc_scaling.setup(v);
>> -    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
>> +    /* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. 
>> */
>> +    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>> +        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
> Since you talk about range checking in the description, shouldn't
> you reject here values with the upper 32 bits non-zero?

Probably. In reality all migration streams have it within range, because
of the types used on the sending side.

>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
>> uint64_t *val)
>>          break;
>> +    case MSR_TSC_AUX:
>> +        if ( !cp->extd.rdtscp )
>> +            goto gp_fault;
> Isn't this breaking the PV use without the feature flag set, but
> running in TSC_MODE_PVRDTSCP? I.e. don't you want
>         if ( !cp->extd.rdtscp &&
>              d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
> ? Remember there are two cases, one being that the host has the
> feature flag set, but the guest has it clear, and the other being
> that both have it clear. Since in the former case the guest can read
> the MSR through RDTSCP, I think the MSR access ought to be
> allowed too.

There is at least a 3rd case, of no hardware RDTSCP support, which is
why we also emulate it in emul-invl-op.c

A guest trying to use PVRDTSC necessarily needs out-of-band signalling
to set it up.  I do not think it is reasonable or appropriate to retain
the ABI breakage of completing reads of the MSR when the instruction
should be architecturally unavailable.

But as you've said and I agree, we definitely need Oracle's input here.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>>          }
>>          break;
>>      }
>> +
>>      d->arch.incarnation = incarnation + 1;
>> +
>> +    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>> +    {
>> +        struct vcpu *v;
>> +
>> +        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
>> +        for_each_vcpu ( d, v )
>> +            v->arch.msr->tsc_aux = d->arch.incarnation;
>> +    }
> This not needing a lock might warrant a comment (the domain is
> [explicitly or implicitly] paused when coming here).

This new piece of logic isn't any different to the rest of
tsc_set_info() WRT being paused.  It is explicitly paused at this point.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.