[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 26/26] tools/libxc: Calculate xstate cpuid leaf from guest information



On 31/03/16 08:48, Jan Beulich wrote:
>>>> On 23.03.16 at 17:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -398,54 +398,97 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
>>      }
>>  }
>>  
>> +/* XSTATE bits in XCR0. */
>> +#define X86_XCR0_X87    (1ULL <<  0)
>> +#define X86_XCR0_SSE    (1ULL <<  1)
>> +#define X86_XCR0_AVX    (1ULL <<  2)
>> +#define X86_XCR0_BNDREG (1ULL <<  3)
>> +#define X86_XCR0_BNDCSR (1ULL <<  4)
>> +#define X86_XCR0_LWP    (1ULL << 62)
> Why an incomplete set? At least PKRU should be needed right
> away. And I see no reason why the three AVX-512 pieces can't
> be put here right away too.

PKRU is another victim of this series being rebased over the
introduction of new functionality.  I will re-add it.

AVX-512 would require adding the AVX feature flags, and deciphering the
dependency tree for all of them.  I have no ability to test any such
additions (no available hardware), and don't want to introduce
possibly-buggy code ahead of full support being added.

>
>> +#define X86_XSS_MASK    (0) /* No XSS states supported yet. */
>> +
>> +/* Per-component subleaf flags. */
>> +#define XSTATE_XSS      (1ULL <<  0)
>> +#define XSTATE_ALIGN64  (1ULL <<  1)
>> +
>>  /* Configure extended state enumeration leaves (0x0000000D for xsave) */
>>  static void xc_cpuid_config_xsave(xc_interface *xch,
>>                                    const struct cpuid_domain_info *info,
>>                                    const unsigned int *input, unsigned int 
>> *regs)
>>  {
>> -    if ( info->xfeature_mask == 0 )
>> +    uint64_t guest_xfeature_mask;
>> +
>> +    if ( info->xfeature_mask == 0 ||
>> +         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
>>      {
>>          regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>          return;
>>      }
>>  
>> +    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
>> +
>> +    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_AVX;
>> +
>> +    if ( test_bit(X86_FEATURE_MPX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_BNDREG | X86_XCR0_BNDCSR;
>> +
>> +    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_LWP;
>> +
>> +    /*
>> +     * Clamp to host mask.  Should be no-op, as guest_xfeature_mask should 
>> not
>> +     * be able to be calculated as larger than info->xfeature_mask.
>> +     *
>> +     * TODO - see about making this a harder error.
>> +     */
>> +    guest_xfeature_mask &= info->xfeature_mask;
> This is ugly.

And now I think about it, wrong.  Dom0's cpuid view is that of a PV
guest, which comes with no XSAVES (which will impact the future support
of Processor Trace), and no PKRU.

>  For one, your dependency mechanism should be able to
> express the dependencies you "manually"enforce above. And beyond
> that masking with info->xfeature_mask should be all that's needed,
> together with enforcing the XCR0 / XSS split ...
>
>>      switch ( input[1] )
>>      {
>> -    case 0: 
>> +    case 0:
>>          /* EAX: low 32bits of xfeature_enabled_mask */
>> -        regs[0] = info->xfeature_mask & 0xFFFFFFFF;
>> +        regs[0] = guest_xfeature_mask;
>>          /* EDX: high 32bits of xfeature_enabled_mask */
>> -        regs[3] = (info->xfeature_mask >> 32) & 0xFFFFFFFF;
>> +        regs[3] = guest_xfeature_mask >> 32;
> ... here and ...
>
>>      case 1: /* leaf 1 */
>>          regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
>> -        regs[2] &= info->xfeature_mask;
>> -        regs[3] = 0;
>> +        regs[2] = guest_xfeature_mask & X86_XSS_MASK;
>> +        regs[3] = (guest_xfeature_mask >> 32) & X86_XSS_MASK;
> ... here. Yet not by a compile time defined mask, but by using
> (host) CPUID output: It is clear that once a bit got assigned to XCR0
> vs XSS, it won't ever change. Hence it doesn't matter whether you
> use the guest or host view of that split. And this will then also - other
> than you've said before would be unavoidable - make unnecessary to
> always update this code when new states get added.

There is no possible way of avoiding having a whitelist somewhere, which
limits what Xen will tolerate supporting for the guest.

All of this code should have been implemented in Xen in the first
place.  I am afraid that this can't be fixed properly without my further
plans to do fully policy handling in Xen.

I will see if I can find a minimal way of fixing this for 4.7, but it is
yet another example of xstate handling simply being broken in tree.

>
>> -    case 2 ... 63: /* sub-leaves */
>> -        if ( !(info->xfeature_mask & (1ULL << input[1])) )
>> +
>> +    case 2 ... 62: /* per-component sub-leaves */
>> +        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
>>          {
>>              regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>              break;
>>          }
>>          /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
>> -        regs[2] = regs[3] = 0;
>> +        regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
> Wouldn't this better also use the "known features" approach, by
> adding yet another word in cpufeatureset.h?

No - I (thought) I had already explained why.

There is a mapping between features and available xstate to use those
features (with some features mapping to multiple xstates).  Having the
valid xstates derived from the configured features prevents the two
getting out of sync, and advertising a feature without its applicable
xstate, or advertising an xstate without the appropriate feature bit.

>
> Btw., looking at that header again I now wonder whether it
> wouldn't have been neater to make XEN_CPUFEATURE() a
> 3-parameter macro, with word and bit specified separately
> and a default definition of
>
> #define XEN_CPUFEATURE(name, word, bit) XEN_X86_FEATURE_##name = (word) * 32 
> + (bit),
>
> avoiding the ugly repeated "*32" in all macro invocations. Of
> course we'd need to adjust this before we release with this new
> interface.

I'd prefer not to.  The "*32" is the expected way of reading the
constants, and providing the word and bit separately allows for someone
to try and do something silly by not multiplying by 32 themselves.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.