[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v5 10/18] xen/cpufreq: introduce a new amd cppc driver for cpufreq scaling


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: "Penny, Zheng" <penny.zheng@xxxxxxx>
  • Date: Fri, 4 Jul 2025 03:40:17 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P8I4AlRl9Hg8N2gkbc45S8Pdyga0yzuwQuxrTHOK9X8=; b=j46QIQxJZEnaWtjtP/v+u5XhZUKpEjulJ/Ph7HUdm8EJ6sEORAvtvb0o2sX8m9v823jooUsAQ8EWkRNjPMQybMqoo6gvi2Zj1JJjesrBxCXx2soY+X7p023QOYcaYfsZQDC3dzWl26Sav29NXqdv6te9HMbMOAGU1kgfXh5ML9n0sUDHlu3MR7jSJMr0obdLeD6BeMj37KnsVf3K1kReK5FnEg3U+LhJQOScn7uX2rTcmh9RNopwSQEhgFPH40LtLwG57IXGTHMWr0VhS8JXUw4yKtJlPYzehnEEpck3WiI1eUDwyZnhvj3ycYa7U6Y37jOzh4xOGaA0dz1FyJiIiw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=y2wqP+i4RowPXq61O6qgUiecSHg8LOwQQ8LvkS7010qCu7LxP37nOGAlhHuGRg+O08tbzbKsVRJnTB3H+7Sogu/yYHgzTDhV3yuWS0bwCnGS+cEXStEuoFsbIG+sjwgs+glqk8LL9YvQ3BUx26ACGhcW7F1ud6ohhnfp1FlmRBNhSSEjx46bIiyNkvOV9HLpeAlHqp+esiQb2VIxtAVF1RMZIi/cmX+AnDNxZWNHQEXebqnKhLkbA5kht52lnyL9v6mIIUpnJq/Bp0bP3D2ZZ6jxjhE9X62uomDE0RKtT/2OVv6oojsfQuyv62qzvYyw58V8BGMHz3Z23N3mf8oC7w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: "Huang, Ray" <Ray.Huang@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 04 Jul 2025 03:40:49 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Msip_labels: MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Enabled=True;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SetDate=2025-07-04T03:40:07.0000000Z;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Name=Open Source;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_ContentBits=3;MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Method=Privileged
  • Thread-index: AQHbzuRFzooSbGuOOkWtJb1ZJ/MIx7QGEZYAgBiv97CAAB5CAIACoScg
  • Thread-topic: [PATCH v5 10/18] xen/cpufreq: introduce a new amd cppc driver for cpufreq scaling

[Public]

> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Wednesday, July 2, 2025 6:48 PM
> To: Penny, Zheng <penny.zheng@xxxxxxx>
> Cc: Huang, Ray <Ray.Huang@xxxxxxx>; Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx>; Roger Pau Monné <roger.pau@xxxxxxxxxx>; xen-
> devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v5 10/18] xen/cpufreq: introduce a new amd cppc driver for
> cpufreq scaling
>
> On 02.07.2025 11:49, Penny, Zheng wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@xxxxxxxx>
> >> Sent: Tuesday, June 17, 2025 12:00 AM
> >> To: Penny, Zheng <penny.zheng@xxxxxxx>
> >>
> >> On 27.05.2025 10:48, Penny Zheng wrote:
> >>> --- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> >>> +++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
> >>> +        /*
> >>> +         * We don't need to convert to kHz for computing offset and can
> >>> +         * directly use nominal_mhz and lowest_mhz as the division
> >>> +         * will remove the frequency unit.
> >>> +         */
> >>> +        offset = data->caps.nominal_perf -
> >>> +                 (mul * cppc_data->cpc.nominal_mhz) / div;
> >>> +    }
> >>> +    else
> >>> +    {
> >>> +        /* Read Processor Max Speed(MHz) as anchor point */
> >>> +        mul = data->caps.highest_perf;
> >>> +        div = this_cpu(pxfreq_mhz);
> >>> +        if ( !div )
> >>> +            return -EINVAL;
> >>
> >> What's wrong about the function arguments in this case? (Same
> >> question again on further uses of EINVAL below.)
> >
> > If we could not get processor max frequency, the whole function is 
> > useless...
> > Maybe -EOPNOTSUPP is more suitable than -EINVAL;
>
> I don't like EOPNOTSUPP very much either for the purpose, but it's surely 
> better
> than EINVAL.
>
> >>> +static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy 
> >>> *policy,
> >>> +                                            unsigned int target_freq,
> >>> +                                            unsigned int relation) {
> >>> +    unsigned int cpu = policy->cpu;
> >>> +    const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
> cpu);
> >>> +    uint8_t des_perf;
> >>> +    int res;
> >>> +
> >>> +    if ( unlikely(!target_freq) )
> >>> +        return 0;
> >>> +
> >>> +    res = amd_cppc_khz_to_perf(data, target_freq, &des_perf);
> >>> +    if ( res )
> >>> +        return res;
> >>> +
> >>> +    /*
> >>> +     * Setting with "lowest_nonlinear_perf" to ensure governoring
> >>> +     * performance in P-state range.
> >>> +     */
> >>> +    amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
> >>> +                           des_perf, data->caps.highest_perf);
> >>
> >> I fear I don't understand the comment, and hence it remains unclear
> >> to me why lowest_nonlinear_perf is being used here.
> >
> > How about
> > ```
> > Choose lowest nonlinear performance as the minimum performance level at 
> > which
> the platform may run.
> > Lowest nonlinear performance is the lowest performance level at which
> > nonlinear power savings are achieved, Above this threshold, lower 
> > performance
> levels should be generally more energy efficient than higher performance 
> levels.
> > ```
>
> I finally had to go to the ACPI spec to understand what this is about. There 
> looks to
> be an implication that lowest <= lowest_nonlinear, and states in that range 
> would
> correspond more to T-states than to P-states. With that I think I agree with 
> the use

Yes, It doesn't have definitive conclusion about relation between lowest and 
lowest_nonlinear
In our internal FW designed spec, it always shows lowest_nonlinear corresponds 
to P2

> of lowest_nonlinear here. The comment, however, could do with moving farther
> away from merely quoting the pretty abstract text in the ACPI spec, as such
> quoting doesn't help in clarifying terminology used, when that terminology 
> also isn't
> explained anywhere else in the code base.


How about we add detailed explanations about each terminology in the beginning
declaration , see:
```
+/*
+ * Field highest_perf, nominal_perf, lowest_nonlinear_perf, and lowest_perf
+ * contain the values read from CPPC capability MSR.
+ * Field highest_perf represents highest performance, which is the absolute
+ * maximum performance an individual processor may reach, assuming ideal
+ * conditions
+ * Field nominal_perf represents maximum sustained performance level of the
+ * processor, assuming ideal operating conditions.
+ * Field lowest_nonlinear_perf represents Lowest Nonlinear Performance, which
+ * is the lowest performance level at which nonlinear power savings are
+ * achieved. Above this threshold, lower performance levels should be
+ * generally more energy efficient than higher performance levels.
+ * Field lowest_perf represents the absolute lowest performance level of the
+ * platform.
+ *
+ * Field max_perf, min_perf, des_perf store the values for CPPC request MSR.
+ * Field max_perf conveys the maximum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive.
+ * Field min_perf conveys the minimum performance level at which the platform
+ * may run. And it may be set to any performance value in the range
+ * [lowest_perf, highest_perf], inclusive but must be less than or equal to
+ * max_perf.
+ * Field des_perf conveys performance level Xen is requesting. And it may be
+ * set to any performance value in the range [min_perf, max_perf], inclusive.
+ */
+struct amd_cppc_drv_data
+{
+    const struct xen_processor_cppc *cppc_data;
+    union {
+        uint64_t raw;
+        struct {
+            unsigned int lowest_perf:8;
+            unsigned int lowest_nonlinear_perf:8;
+            unsigned int nominal_perf:8;
+            unsigned int highest_perf:8;
+            unsigned int :32;
+        };
+    } caps;
+    union {
+        uint64_t raw;
+        struct {
+            unsigned int max_perf:8;
+            unsigned int min_perf:8;
+            unsigned int des_perf:8;
+            unsigned int epp:8;
+            unsigned int :32;
+        };
+    } req;
+
+    int err;
+};
``
Then here, we could elaborate the reason why we choose lowest_nonlinear_perf 
over lowest_perf:
```
+    /*
+     * Having a performance level lower than the lowest nonlinear
+     * performance level, such as, lowest_perf <= perf <= lowest_nonliner_perf,
+     * may actually cause an efficiency penalty, So when deciding the min_perf
+     * value, we prefer lowest nonlinear performance over lowest performance
+     */
+    amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
+                           des_perf, data->caps.highest_perf);
```

>
> Jan

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.