[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/cpufreq: Reset policy after enabling/disabling turbo status



On 15/11/2021 16:21, Jan Beulich wrote:
> On 15.11.2021 15:33, Andrew Cooper wrote:
>> On 15/11/2021 10:53, Jan Beulich wrote:
>>> On 12.11.2021 19:51, Andrew Cooper wrote:
>>>> On 10/11/2021 09:19, Jane Malalane wrote:
>>>>> Before, user would change turbo status but this had no effect: boolean
>>>>> was set but policy wasn't reevaluated.  Policy must be reevaluated so
>>>>> that CPU frequency is chosen according to the turbo status and the
>>>>> current governor.
>>>>>
>>>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>>>>>
>>>>> Reported-by: <edvin.torok@xxxxxxxxxx>
>>>>> Signed-off-by: <jane.malalane@xxxxxxxxxx>
>>>>> ---
>>>>> CC: Jan Beulich <jbeulich@xxxxxxxx>
>>>>> CC: Ian Jackson <iwj@xxxxxxxxxxxxxx>
>>>>> ---
>>>>>
>>>>> Release rationale:
>>>>> Not taking this patch means that turbo status is misleading.
>>>>>
>>>>> Taking this patch is low-risk as it only uses a function that already
>>>>> exists and is already used for setting the chosen scaling governor.
>>>>> Essentially, this change is equivalent to running 'xenpm
>>>>> en/disable-turbo-mode' and, subsequently, running 'xenpm
>>>>> set-scaling-governor <current governor>'.
>>>>> ---
>>>>>  xen/drivers/cpufreq/utility.c | 6 +++++-
>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
>>>>> index b93895d4dd..5f200ff3ee 100644
>>>>> --- a/xen/drivers/cpufreq/utility.c
>>>>> +++ b/xen/drivers/cpufreq/utility.c
>>>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>>>>      {
>>>>>          ret = cpufreq_driver.update(cpuid, policy);
>>>>>          if (ret)
>>>>> +        {
>>>>>              policy->turbo = curr_state;
>>>>> +            return ret;
>>>>> +        }
>>>>>      }
>>>>>  
>>>>> -    return ret;
>>>>> +    /* Reevaluate current CPU policy. */
>>>>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>>>>>  }
>>>> So, having looked through the manual, what the cpufreq_driver does for
>>>> turbo on Intel is bogus according to the SDM.
>>>>
>>>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
>>>> package) for firmware to configure turbo, but it manifests as another
>>>> dynamic CPUID bit (which I think we handle correctly).  It has the same
>>>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
>>>> adding it to the set of bits we clear during boot.
>>> This is quite confusing in the docs - at least one of the tables calls
>>> the bit "IDA Disable", while other entries at least also refer to the
>>> effect of disabling IDA. I'm afraid I have trouble connecting turbo
>>> mode and IDA disabling, unless both are two different names of the
>>> same thing. Maybe they really are, except that SDM vol 2 uses yet
>>> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost
>>> Technology".
>> SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases.  It
>> reads as if IDA is the general technology name, and turbo is a sub-mode
>> for "doing it automatically without software involvement".
>>
>> On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with
>> further adds to the confusion of which is which.
>>
>>>> However, the correct way to turn off turbo is to set
>>>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
>>>> such, it *should* behave far more like the AMD CPB path.
>>> I'm afraid public documentation has no mention of a bit of this name.
>>> Considering the above I wonder whether you mean "IDA engage" (bit 32),
>>> albeit this doesn't seem very likely when you're taking about a
>>> "disengage" bit.
>> It is that bit.  Despite the name given in Vol4 Table 2-12, it is "set
>> to disable".
>>
>> Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and
>> the surrounding text makes it clear that it is disable bit, not an
>> enable bit.  Also, that's how the Linux intel_pstate driver uses it.
> Okay, then I agree with the proposal you've made.

I've just done some experimentation on a Intel(R) Xeon(R) E-2288G (CFL-R
based), and both the MISC_ENABLE and PERF_CTL bits have the same effect,
and cut nearly 1GHz off the APERF/MPERF reported frequency on a busy
single core on an otherwise idle system.

I have not checked the effect that PERF_CTL on thread 0 has on other
threads in the package, but the reality is that ~100% of production use
of these controls are for all CPUs at once.  (So much so, that I really
think the interface ought to gain a -1 sentential for "all cpus", rather
than forcing userspace to loop over each cpu individually, when Xen can
handle the entire loop in O(1) time.)

~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.