[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 10/18] xen/cpufreq: introduce a new amd cppc driver for cpufreq scaling


  • To: "Penny, Zheng" <penny.zheng@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 4 Jul 2025 08:21:07 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: "Huang, Ray" <Ray.Huang@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 04 Jul 2025 06:21:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.07.2025 05:40, Penny, Zheng wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: Wednesday, July 2, 2025 6:48 PM
>>
>> On 02.07.2025 11:49, Penny, Zheng wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Sent: Tuesday, June 17, 2025 12:00 AM
>>>> To: Penny, Zheng <penny.zheng@xxxxxxx>
>>>>
>>>> On 27.05.2025 10:48, Penny Zheng wrote:
>>>>> +static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy 
>>>>> *policy,
>>>>> +                                            unsigned int target_freq,
>>>>> +                                            unsigned int relation) {
>>>>> +    unsigned int cpu = policy->cpu;
>>>>> +    const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
>> cpu);
>>>>> +    uint8_t des_perf;
>>>>> +    int res;
>>>>> +
>>>>> +    if ( unlikely(!target_freq) )
>>>>> +        return 0;
>>>>> +
>>>>> +    res = amd_cppc_khz_to_perf(data, target_freq, &des_perf);
>>>>> +    if ( res )
>>>>> +        return res;
>>>>> +
>>>>> +    /*
>>>>> +     * Setting with "lowest_nonlinear_perf" to ensure governoring
>>>>> +     * performance in P-state range.
>>>>> +     */
>>>>> +    amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
>>>>> +                           des_perf, data->caps.highest_perf);
>>>>
>>>> I fear I don't understand the comment, and hence it remains unclear
>>>> to me why lowest_nonlinear_perf is being used here.
>>>
>>> How about
>>> ```
>>> Choose lowest nonlinear performance as the minimum performance level at 
>>> which
>> the platform may run.
>>> Lowest nonlinear performance is the lowest performance level at which
>>> nonlinear power savings are achieved, Above this threshold, lower 
>>> performance
>> levels should be generally more energy efficient than higher performance 
>> levels.
>>> ```
>>
>> I finally had to go to the ACPI spec to understand what this is about. There 
>> looks to
>> be an implication that lowest <= lowest_nonlinear, and states in that range 
>> would
>> correspond more to T-states than to P-states. With that I think I agree with 
>> the use
> 
> Yes, It doesn't have definitive conclusion about relation between lowest and 
> lowest_nonlinear
> In our internal FW designed spec, it always shows lowest_nonlinear 
> corresponds to P2
> 
>> of lowest_nonlinear here. The comment, however, could do with moving farther
>> away from merely quoting the pretty abstract text in the ACPI spec, as such
>> quoting doesn't help in clarifying terminology used, when that terminology 
>> also isn't
>> explained anywhere else in the code base.
> 
> 
> How about we add detailed explanations about each terminology in the beginning
> declaration , see:
> ```
> +/*
> + * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
> + * contain the values read from CPPC capability MSR.
> + * Field highest_perf represents highest performance, which is the absolute
> + * maximum performance an individual processor may reach, assuming ideal
> + * conditions
> + * Field nominal_perf represents maximum sustained performance level of the
> + * processor, assuming ideal operating conditions.
> + * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
> + * is the lowest performance level at which nonlinear power savings are
> + * achieved. Above this threshold, lower performance levels should be
> + * generally more energy efficient than higher performance levels.

Which is still only the vague statement also found in the spec. This says 
nothing
about what happens below that level, or what the relationship to other fields 
is.

> + * Field lowest_perf represents the absolute lowest performance level of the
> + * platform.
> + *
> + * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
> + * Field max_perf conveys the maximum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive.
> + * Field min_perf conveys the minimum performance level at which the platform
> + * may run. And it may be set to any performance value in the range
> + * [lowest_perf, highest_perf], inclusive but must be less than or equal to
> + * max_perf.
> + * Field des_perf conveys performance level Xen is requesting. And it may be
> + * set to any performance value in the range [min_perf, max_perf], inclusive.
> + */
> +struct amd_cppc_drv_data
> +{
> +    const struct xen_processor_cppc *cppc_data;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            unsigned int lowest_perf:8;
> +            unsigned int lowest_nonlinear_perf:8;
> +            unsigned int nominal_perf:8;
> +            unsigned int highest_perf:8;
> +            unsigned int :32;
> +        };
> +    } caps;
> +    union {
> +        uint64_t raw;
> +        struct {
> +            unsigned int max_perf:8;
> +            unsigned int min_perf:8;
> +            unsigned int des_perf:8;
> +            unsigned int epp:8;
> +            unsigned int :32;
> +        };
> +    } req;
> +
> +    int err;
> +};
> ``
> Then here, we could elaborate the reason why we choose lowest_nonlinear_perf 
> over lowest_perf:
> ```
> +    /*
> +     * Having a performance level lower than the lowest nonlinear
> +     * performance level, such as, lowest_perf <= perf <= 
> lowest_nonliner_perf,
> +     * may actually cause an efficiency penalty, So when deciding the 
> min_perf
> +     * value, we prefer lowest nonlinear performance over lowest performance
> +     */
> +    amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
> +                           des_perf, data->caps.highest_perf);
> ```

This reads fine to me.

Question then is though: Is setting lowest_perf as the low boundary a good
idea in any of the places? (Iirc it is used in one or two places. Or am I
misremembering?)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.