[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 6/9] x86/intel_pstate: the main boby of the intel_pstate driver



>>> On 29.05.15 at 10:19, <wei.w.wang@xxxxxxxxx> wrote:
> On 26/05/2015 21:58, Jan Beulich wrote
>> >>> On 13.05.16 at 09:50, <wei.w.wang@xxxxxxxxx> wrote:
>> > +static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
>> > +{
>> > +    cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
>> > +                                     policy->cpuinfo.max_freq);
>> > +
>> > +    if ( policy->policy != CPUFREQ_POLICY_POWERSAVE   &&
>> > +         policy->policy != CPUFREQ_POLICY_PERFORMANCE &&
>> > +         policy->policy != CPUFREQ_POLICY_USERSPACE   &&
>> > +         policy->policy != CPUFREQ_POLICY_ONDEMAND )
>> 
>> switch()
> 
> How would we use switch() here? 

    switch ( policy->policy )
    {
    case CPUFREQ_POLICY_POWERSAVE:

etc. But I thought that to be obvious, so I'm not sure I understand
what you don't understand.

>> > +static int intel_pstate_cpu_init(struct cpufreq_policy *policy) {
>> > +    struct cpudata *cpu;
>> > +    int rc;
>> > +
>> > +    rc = intel_pstate_init_cpu(policy->cpu);
>> 
>> Having both intel_pstate_cpu_init() and intel_pstate_init_cpu() is at least
>> odd, the more that the latter is being called from only here.
> 
> Are you suggesting to change the function name? If so, how about changing 
> intel_pstate_cpu_init() to intel_pstate_setup()?

Either that, or fold the called function into the caller.

>> > +    if (rc)
>> > +        return rc;
>> > +
>> > +    cpu = all_cpu_data[policy->cpu];
>> > +    if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100)
>> > +        policy->policy = CPUFREQ_POLICY_PERFORMANCE;
>> > +    else
>> > +        policy->policy = CPUFREQ_POLICY_ONDEMAND;
>> > +
>> > +    policy->min = cpu->pstate.min_pstate * cpu->pstate.scaling;
>> > +    policy->max = cpu->pstate.turbo_pstate * cpu->pstate.scaling;
>> > +    policy->min_perf_pct = 0;
>> > +    policy->max_perf_pct = 100;
>> > +    policy->turbo_pct = get_turbo_pct();
>> > +
>> > +    /* cpuinfo and default policy values */
>> > +    policy->cpuinfo.min_freq = cpu->pstate.min_pstate * cpu-
>> >pstate.scaling;
>> > +    policy->cpuinfo.max_freq =
>> > +        cpu->pstate.turbo_pstate * cpu->pstate.scaling;
>> > +    policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
>> > +    cpumask_set_cpu(policy->cpu, policy->cpus);
>> > +
>> > +    /* the first cpu do the init work for limits.min/max_policy_pct */
>> > +    if (cpu->cpu == 0) {
>> 
>> cpu->cpu == 0 doesn't mean this is the first CPU to come here.
> 
> How about this one:
> 
> If (limits.min_policy_pct == 0) {
>       limits.min_policy_pct = ....
>       limits.xx = ....
> }
> 
> limits.min_policy_pct won't be 0 if it is initialized.

If that's guaranteed - fine with me.

>> > +static int intel_pstate_msrs_not_valid(void) {
>> > +    /* Check that all the msr's we are using are valid. */
>> > +    u64 aperf, mperf, tmp;
>> > +
>> > +    rdmsrl(MSR_IA32_APERF, aperf);
>> > +    rdmsrl(MSR_IA32_MPERF, mperf);
>> > +
>> > +    if (!pstate_funcs.get_max() ||
>> > +        !pstate_funcs.get_min() ||
>> > +        !pstate_funcs.get_turbo())
>> > +        return -ENODEV;
>> > +
>> > +    rdmsrl(MSR_IA32_APERF, tmp);
>> > +    if (!(tmp - aperf))
>> 
>> Why not "if(tmp == aperf)"? And - is it guaranteed that APERF (and MPERF
>> below) incremented in the meantime?
> 
> Will change it to "if (tmp == aperf)".
> According to the SDM, IA32_MPERF MSR (E7H) increments in proportion to a 
> fixed frequency, and IA32_APERF MSR increments in proportion to actual 
> performance. So, as long as the two MSRs are valid, their values will be 
> increased.

The "in proportion" is what makes me nervous: What if the proportion
is 1 in every 1000 cycles?

>> > +int __init intel_pstate_init(void)
>> > +{
>> > +    int cpu, rc = 0;
>> > +    const struct x86_cpu_id *id;
>> > +    struct cpu_defaults *cpu_info;
>> > +
>> > +    if (cpuid_ecx(6) & 0x1)
>> > +        set_bit(X86_FEATURE_APERFMPERF,
>> > + &boot_cpu_data.x86_capability);
>> 
>> This should be consolidated out of the other cpufreq drivers into
>> cpu/common.c.
> 
> How about moving it to the bottom of init_intel() in cpu/intel.c ?

It's not Intel specific, so it belongs in cpu/common.c.

>> > --- a/xen/include/asm-x86/acpi.h
>> > +++ b/xen/include/asm-x86/acpi.h
>> > @@ -32,6 +32,8 @@
>> >  #define COMPILER_DEPENDENT_INT64   long long
>> >  #define COMPILER_DEPENDENT_UINT64  unsigned long long
>> >
>> > +extern int intel_pstate_init(void);
>> 
>> Why in acpi.h?
> 
> I was thinking that p-state is part of ACPI. Do you prefer to create a new .h 
> called cpufreq.h, just like the cpuidle.h there?

ACPI is a way to convey information about P- (and C- and T-) states,
but the latter are imo not tied to ACPI. In fact your patch series here
proves the opposite: You add code dealing with P-states without
using information coming from ACPI. I think this should go into what
currently is acpi/cpufreq/cpufreq.h, which is expected to be moved
out of acpi/ anyway (I seem to even recall an ARM side series aiming
at doing that).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.