|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for cpufreq scaling
[Public]
Hi,
Sorry for the confidential header before. I finally learned how to remove
them...
> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Wednesday, February 12, 2025 12:46 AM
> To: Penny, Zheng <penny.zheng@xxxxxxx>
> Cc: Huang, Ray <Ray.Huang@xxxxxxx>; Andryuk, Jason
> <Jason.Andryuk@xxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>;
> Roger Pau Monné <roger.pau@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 05/11] xen/x86: introduce a new amd cppc driver for
> cpufreq scaling
>
> On 06.02.2025 09:32, Penny Zheng wrote:
> > --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> > +}
> > +
> > +static int cf_check amd_cppc_write_request(int cpu, uint8_t min_perf,
> > + uint8_t des_perf, uint8_t
> > +max_perf) {
> > + struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
> > + uint64_t prev = data->req.raw;
> > +
> > + data->req.min_perf = min_perf;
> > + data->req.max_perf = max_perf;
> > + data->req.des_perf = des_perf;
> > +
> > + if ( prev == data->req.raw )
> > + return 0;
> > +
> > + on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs,
> > + data, 1);
> > +
> > + return data->err;
> > +}
>
> ... in this function. Then the field would be written to (and the cacheline
> change
> ownership) only in the error case.
>
> As to the function's parameters - why's there a plain int?
>
Are you asking why "int cpu" here?
> > + /* Package level MSR */
> > + if ( rdmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > + {
> > + amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_ENABLE)\n");
> > + goto err;
> > + }
> > +
> > + /*
> > + * Only when Enable bit is on, the hardware will calculate the
> > processor’s
> > + * performance capabilities and initialize the performance level
> > fields in
> > + * the CPPC capability registers.
> > + */
> > + if ( !(val & AMD_CPPC_ENABLE) )
> > + {
> > + val |= AMD_CPPC_ENABLE;
> > + if ( wrmsr_safe(MSR_AMD_CPPC_ENABLE, val) )
> > + {
> > + amd_cppc_err(policy->cpu,
> "wrmsr_safe(MSR_AMD_CPPC_ENABLE, %lx)\n", val);
> > + goto err;
> > + }
> > + }
> > +
> > + if ( rdmsr_safe(MSR_AMD_CPPC_CAP1, data->caps.raw) )
> > + {
> > + amd_cppc_err(policy->cpu, "rdmsr_safe(MSR_AMD_CPPC_CAP1)\n");
> > + goto err;
> > + }
> > +
> > + if ( data->caps.highest_perf == 0 || data->caps.lowest_perf == 0 ||
> > + data->caps.nominal_perf == 0 || data->caps.lowest_nonlinear_perf
> > == 0 )
> > + {
> > + amd_cppc_err(policy->cpu,
> > + "Platform malfunction, read CPPC highest_perf: %u,
> lowest_perf: %u, nominal_perf: %u, lowest_nonlinear_perf: %u zero value\n",
> > + data->caps.highest_perf, data->caps.lowest_perf,
> > + data->caps.nominal_perf,
> > data->caps.lowest_nonlinear_perf);
> > + goto err;
> > + }
> > +
> > + data->err = amd_get_min_freq(data, &min_freq);
> > + if ( data->err )
> > + return;
> > +
> > + data->err = amd_get_nominal_freq(data, &nominal_freq);
> > + if ( data->err )
> > + return;
> > +
> > + data->err = amd_get_max_freq(data, &max_freq);
> > + if ( data->err )
> > + return;
> > +
> > + if ( min_freq > max_freq )
> > + {
> > + amd_cppc_err(policy->cpu, "min_freq(%u) or max_freq(%u) value is
> incorrect\n",
> > + min_freq, max_freq);
> > + goto err;
> > + }
>
> And nominal freq not being between the two is okay?
>
> > + policy->min = min_freq;
> > + policy->max = max_freq;
> > +
> > + policy->cpuinfo.min_freq = min_freq;
> > + policy->cpuinfo.max_freq = max_freq;
> > + policy->cpuinfo.perf_freq = nominal_freq;
> > + policy->cur = nominal_freq;
>
> How do you know this is correct? Or does it simply not matter at this point?
>
I need policy->cur to be set for correctly using ondemand governor.
As we don't have any MSR register to reflect runtime perf/freq value under CPPC
control,
I will suggest to use APERF/MPERF average frequency as current freq here.
> > + /* Initial processor data capability frequencies */
> > + data->min_freq = min_freq;
> > + data->nominal_freq = nominal_freq;
> > + data->max_freq = max_freq;
>
> Is this data duplication actually needed? Can't data, if necessary, simply
> have a
> back pointer to the policy for the CPU?
>
> Actually, aren't two of the three values already accessible through the
> cppc_data
> pointer hanging off of data? Which in turn makes me wonder why you go through
> the effort of calculating those values.
>
cppc_data->lowest/nominal frequency comes from ACPI _CPC entry, and they are
optional fields. So we need to calculate them when unavailable.
We need them to set policy->min/max, which are referred in ondemand governor.
But you're right, at least in this patch, we are not using
data->min/max/nominal anywhere
> > + err:
> > + data->err = -EINVAL;
> > +}
>
> At this point you may have set the enable bit already, which you can't undo.
> What effect will this have on the system when the error path is taken here?
> Especially if we then try to fall back to another driver?
>
On current code logic, when the error path is taken, amd-cppc cpufreq driver
fails
to initialize. And we will also not fall back to another driver.
As we could register *only one* cpufreq driver. If amd-cppc is registered
properly
before, then legacy P-states will not get registered.
In amd-cppc registration code:
```
+int __init amd_cppc_register_driver(void)
+{
+ if ( !cpu_has_cppc )
+ return -ENODEV;
+
+ return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+}
```
CPPC feature MSR gets checked before the registration, which is to check
whether the
hardware has CPPC msr support.
> Jan
Many thanks,
Penny
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |