[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 12/30] xen/x86: Generate deep dependencies of features



On 15/02/16 14:06, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@xxxxxxxxxx> wrote:
>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>  
>>  static void sanitise_featureset(uint32_t *fs)
>>  {
>> +    uint32_t disabled_features[FSCAPINTS];
>>      unsigned int i;
>>  
>>      for ( i = 0; i < FSCAPINTS; ++i )
>>      {
>>          /* Clamp to known mask. */
>>          fs[i] &= known_features[i];
>> +
>> +        /*
>> +         * Identify which features with deep dependencies have been
>> +         * disabled.
>> +         */
>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>> +    }
>> +
>> +    for_each_set_bit(i, (void *)disabled_features,
>> +                     sizeof(disabled_features) * 8)
>> +    {
>> +        const uint32_t *dfs = lookup_deep_deps(i);
>> +        unsigned int j;
>> +
>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>> +
>> +        for ( j = 0; j < FSCAPINTS; ++j )
>> +        {
>> +            fs[j] &= ~dfs[j];
>> +            disabled_features[j] &= ~dfs[j];
>> +        }
>>      }
> Am I getting the logic in the Python script right that it is indeed
> unnecessary for this loop to be restarted even when a feature
> at a higher numbered bit position results in a lower numbered
> bit getting cleared?

Correct - the python logic flattens the dependency tree so an individual
lookup_deep_deps() will get you all features eventually influenced by
feature i.  It might be the deep features for i includes a feature
numerically lower than i, but because dfs[] contains all features on the
eventual chain, we don't need to start back from 0 again.

I felt this was far more productive to code at build time, rather than
at runtime.

>
>> @@ -153,6 +176,36 @@ void calculate_featuresets(void)
>>      calculate_hvm_featureset();
>>  }
>>  
>> +const uint32_t *lookup_deep_deps(uint32_t feature)
> Do you really mean this rather internal function to be non-static?

(You have found the answer already)

> Even if there was a later use in this series, it would seem suspicious
> to me; in fact I'd have expected for it and ...
>
>> +{
>> +    static const struct {
>> +        uint32_t feature;
>> +        uint32_t fs[FSCAPINTS];
>> +    } deep_deps[] = INIT_DEEP_DEPS;
> ... this data to again be placed in .init.* sections.

For this series, I think it can be .init.  For the future cpuid changes,
I am going to need to use it for validation during the set_cpuid
hypercall, so will have to move to not being .init for that.

>
>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>> nr_entries)
>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>  
>> +    deps = {
>> +        XSAVE:
>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>> +
>> +        AVX:
>> +        (FMA, FMA4, F16C, AVX2, XOP),
> I continue to question whether namely XOP, but perhaps also the
> others here except maybe AVX2, really is depending on AVX, and
> not just on XSAVE.

I am sure we have had this argument before.

All VEX encoded SIMD instructions (including XOP which is listed in the
same category by AMD) are specified to act on 256bit AVX state, and
require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
instructions encoding %xmm registers, which explicitly zero the upper
128bits of the associated %ymm register.

This is very clearly a dependency on AVX, even if it isn't written in
one clear concise statement in the instruction manuals.

>
>> +        PAE:
>> +        (LM, ),
>> +
>> +        LM:
>> +        (CX16, LAHF_LM, PAGE1GB),
>> +
>> +        XMM:
>> +        (LM, ),
>> +
>> +        XMM2:
>> +        (LM, ),
>> +
>> +        XMM3:
>> +        (LM, ),
> I don't think so - SSE2 is a commonly implied prereq for long mode,
> but not SSE3. Instead what I'm missing is a dependency of SSEn,
> AES, SHA and maybe more on SSE (or maybe directly FXSR as per
> above), and of SSE on FXSR. And there may be more, like MMX
> really ought to be dependent on FPU or FXSR (not currently
> expressable as it seems).

I will double check all of these.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.