[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 12/19] xen/cpufreq: implement amd-cppc driver for CPPC in passive mode


  • To: Penny Zheng <Penny.Zheng@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 17 Jul 2025 14:55:10 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: ray.huang@xxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 17 Jul 2025 12:55:33 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.07.2025 05:50, Penny Zheng wrote:
> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> @@ -14,7 +14,95 @@
>  #include <xen/domain.h>
>  #include <xen/init.h>
>  #include <xen/param.h>
> +#include <xen/percpu.h>
> +#include <xen/xvmalloc.h>
>  #include <acpi/cpufreq/cpufreq.h>
> +#include <asm/amd.h>
> +#include <asm/msr-index.h>
> +
> +#define amd_cppc_err(cpu, fmt, args...)                             \
> +    printk(XENLOG_ERR "AMD-CPPC: CPU%u error: " fmt, cpu, ## args)
> +#define amd_cppc_warn(cpu, fmt, args...)                            \
> +    printk(XENLOG_WARNING "AMD-CPPC: CPU%u warning: " fmt, cpu, ## args)
> +#define amd_cppc_verbose(cpu, fmt, args...)                         \
> +({                                                                  \
> +    if ( cpufreq_verbose )                                          \
> +        printk(XENLOG_DEBUG "AMD-CPPC: CPU%u " fmt, cpu, ## args);  \
> +})
> +
> +/*
> + * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
> + * contain the values read from CPPC capability MSR. They represent the 
> limits
> + * of managed performance range as well as the dynamic capability, which may
> + * change during processor operation
> + * Field highest_perf represents highest performance, which is the absolute
> + * maximum performance an individual processor may reach, assuming ideal
> + * conditions. This performance level may not be sustainable for long
> + * durations and may only be achievable if other platform components
> + * are in a specific state; for example, it may require other processors be
> + * in an idle state. This would be equivalent to the highest frequencies
> + * supported by the processor.
> + * Field nominal_perf represents maximum sustained performance level of the
> + * processor, assuming ideal operating conditions. All cores/processors are
> + * expected to be able to sustain their nominal performance state\

Nit: Stray trailing backslash.

> + * simultaneously.
> + * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
> + * is the lowest performance level at which nonlinear power savings are
> + * achieved. Above this threshold, lower performance levels should be
> + * generally more energy efficient than higher performance levels. So in
> + * traditional terms, this represents the P-state range of performance 
> levels.
> + * Field lowest_perf represents the absolute lowest performance level of the
> + * platform. Selecting it may cause an efficiency penalty but should reduce
> + * the instantaneous power consumption of the processor. So in traditional
> + * terms, this represents the T-state range of performance levels.
> + *
> + * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
> + * Software passes performance goals through these fields.
> + * Field max_perf conveys the maximum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive.
> + * Field min_perf conveys the minimum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive but must be less than or equal to
> + * max_perf.
> + * Field des_perf conveys performance level Xen governor is requesting. And 
> it
> + * may be set to any performance value in the range [min_perf, max_perf],
> + * inclusive.
> + */
> +struct amd_cppc_drv_data
> +{
> +    const struct xen_processor_cppc *cppc_data;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            unsigned int lowest_perf:8;
> +            unsigned int lowest_nonlinear_perf:8;
> +            unsigned int nominal_perf:8;
> +            unsigned int highest_perf:8;
> +            unsigned int :32;
> +        };
> +    } caps;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            unsigned int max_perf:8;
> +            unsigned int min_perf:8;
> +            unsigned int des_perf:8;
> +            unsigned int epp:8;
> +            unsigned int :32;
> +        };
> +    } req;
> +
> +    int err;
> +};
> +
> +static DEFINE_PER_CPU_READ_MOSTLY(struct amd_cppc_drv_data *,
> +                                  amd_cppc_drv_data);
> +/*
> + * Core max frequency read from PstateDef as anchor point
> + * for freq-to-perf transition
> + */
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, pxfreq_mhz);
>  
>  static bool __init amd_cppc_handle_option(const char *s, const char *end)
>  {
> @@ -50,10 +138,327 @@ int __init amd_cppc_cmdline_parse(const char *s, const 
> char *e)
>      return 0;
>  }
>  
> +/*
> + * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> + * use them to convert perf to freq and vice versa. The conversion is
> + * extrapolated as an linear function passing by the 2 points:
> + *  - (Low perf, Low freq)
> + *  - (Nominal perf, Nominal freq)
> + * Parameter freq is always in kHz.
> + */
> +static int amd_cppc_khz_to_perf(const struct amd_cppc_drv_data *data,
> +                                unsigned int freq, uint8_t *perf)
> +{
> +    const struct xen_processor_cppc *cppc_data = data->cppc_data;
> +    unsigned int mul, div;
> +    int offset = 0, res;
> +
> +    if ( cppc_data->cpc.lowest_mhz && cppc_data->cpc.nominal_mhz &&
> +         data->caps.nominal_perf != data->caps.lowest_perf &&
> +         cppc_data->cpc.nominal_mhz != cppc_data->cpc.lowest_mhz )

While I understand that required relations are being checked elsewhere, if
you used > in place of != here, that would not only serve a doc aspect, but
also allow to drop one part:

    if ( cppc_data->cpc.lowest_mhz &&
         data->caps.nominal_perf > data->caps.lowest_perf &&
         cppc_data->cpc.nominal_mhz > cppc_data->cpc.lowest_mhz )

> +    {
> +        mul = data->caps.nominal_perf - data->caps.lowest_perf;
> +        div = cppc_data->cpc.nominal_mhz - cppc_data->cpc.lowest_mhz;
> +
> +        /*
> +         * We don't need to convert to kHz for computing offset and can
> +         * directly use nominal_mhz and lowest_mhz as the division
> +         * will remove the frequency unit.
> +         */
> +        offset = data->caps.nominal_perf -
> +                 (mul * cppc_data->cpc.nominal_mhz) / div;
> +    }
> +    else
> +    {
> +        /* Read Processor Max Speed(MHz) as anchor point */
> +        mul = data->caps.highest_perf;
> +        div = this_cpu(pxfreq_mhz);
> +        if ( !div )
> +            return -EOPNOTSUPP;
> +    }
> +
> +    res = offset + (mul * freq) / (div * 1000);
> +    if ( res > UINT8_MAX )

Why UINT8_MAX here but ...

> +    {
> +        printk_once(XENLOG_WARNING
> +                    "Perf value exceeds maximum value 255: %d\n", res);
> +        *perf = 0xff;

... 0xff here?

> +        return 0;
> +    }
> +    if ( res < 0 )
> +    {
> +        printk_once(XENLOG_WARNING
> +                    "Perf value smaller than minimum value 0: %d\n", res);
> +        *perf = 0;
> +        return 0;
> +    }
> +    *perf = res;

Considering that amd_cppc_init_msrs() rejects perf values of 0 as invalid,
is 0 actually valid as an output here?

> +/*
> + * _CPC may define nominal frequecy and lowest frequency, if not, use
> + * Processor Max Speed as anchor point to calculate.
> + * Output freq stores cpc frequency in kHz
> + */
> +static int amd_get_cpc_freq(const struct amd_cppc_drv_data *data,
> +                            uint32_t cpc_mhz, uint8_t perf, unsigned int 
> *freq)

Once again no need for uint32_t when unsigned int will do.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.