[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID



On 24/08/16 09:41, Jan Beulich wrote:
>>>> On 23.08.16 at 19:26, <andrew.cooper3@xxxxxxxxxx> wrote:
>> Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
>> cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
>> state, rather than before.  (All behaviour in this regard appears completely
>> undocumented by both Intel and AMD).
>>
>> Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
>> clear bit forces the guests view to 0, even if CR4.OSXSAVE is actually set.
>>
>> This allows Xen to provide an architectural view of a guest kernels
>> CR4.OSXSAVE setting to any CPUID instruction issused by guest kernel or
>> userspace.
> Perhaps worth saying "non-PV CPUID instruction issued"?

I have gone with "native CPUID", and a "even when masking is used." suffix.

>
>> The masking value defaults to 1 (if the guest has XSAVE available) to cause
>> fast-forwarding to occur for the HVM and idle vcpus.
> Is the latter part actually true? I don't think idle vCPU-s get through
> update_domain_cpuid_info(), as that gets called solely from a domctl
> handler.

Urgh - this got complicated when I fixed dom0.

I was previously using these_masks != &cpuidmask_defaults, but this test
fails for dom0, as nothing allocates suitable per-domain masks.

I probably do need an extra "&& !is_idle_vcpu(v)" in the condition. 
That will cause Idle and HVM guests to always have the bits set, as they
are set in the default masks.

>
>> Repored-by: Jan Beulich <JBeulich@xxxxxxxx>
> Reported-...

Oops yes.

>
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -211,6 +211,24 @@ static void amd_ctxt_switch_levelling(const struct vcpu 
>> *next)
>>              (nextd && is_pv_domain(nextd) && 
>> nextd->arch.pv_domain.cpuidmasks)
>>              ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
>>  
>> +    if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
>> +            uint64_t val = masks->_1cd;
>> +
>> +            /*
>> +             * OSXSAVE defaults to 1, which causes fast-forwarding of
>> +             * Xen's real setting.  Clobber it if disabled by the guest
> Tab inside the comment?

Oh yes - I know how that will have happened.  Sorry for not noticing.

>
>> +             * kernel.
>> +             */
>> +            if (next && is_pv_vcpu(next) &&
>> +                !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
>> +                    val &= ~((uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 
>> 32);
> I don't think idle vCPU-s would ever have their ctrlreg[4] updated, so
> other than the description says you hide the flag here for them.

I said that I wouldn't clobber the bit for idle vcpus.

But yes, they won't have X86_CR4_OSXSAVE set, because of
real_cr4_to_pv_guest_cr4()

>  Since
> that shouldn't matter much except for the number of cases the
> WRMSR below gets executed because the masks didn't match, I'm
> not sure whether it's better to fix it here or to alter the description.
> One might guess that fixing it would be better, as likely going forward
> most (all?) guests will enable xsave, and hence we might save the
> WRMSR in most of the cases if not needed for any other reason.

We want all idle vcpus and hvm vcpus running at the defaults, to reduce
the number of times we lazily switch context.

>
>> @@ -218,6 +235,11 @@ static void __init noinline intel_init_levelling(void)
>>              ecx &= opt_cpuid_mask_ecx;
>>              edx &= opt_cpuid_mask_edx;
>>  
>> +            /* Fast-forward bits - Must be set. */
>> +            if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
>> +                    ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +            edx |= cpufeat_mask(X86_FEATURE_APIC);
>> +
>>              cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
>>      }
> Do we really want to rely on the mask MSRs to have all interesting
> bits (and namely the two ones relevant here) set by the firmware?
> I.e. don't you want to instead OR in the two bits after the &=?

I think so, yes.  If the BIOS got it wrong to start with, Xen would fail
to set the features up in the first place, and this logic wouldn't apply.

If the bits are not set, I don't want to try and second-guess what might
be going on.  If anyone encounters such a situation, then we can see
about re-evaluating.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.