[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v5 11/18] xen/x86: implement EPP support for the amd-cppc driver in active mode


  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Penny Zheng <Penny.Zheng@xxxxxxx>
  • Date: Tue, 27 May 2025 16:48:26 +0800
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Yck7AKHKtbmxCUvtsRyzheF4rrJVBYbut7aEbRVxucI=; b=sBTlnLL5wIwPprtT9PYSdZ67L+bwbuNOS+mgTVs4ku2ZTRzZwPzKxk9Lnt2cOUlD6QgCzX+gZ0yUz2KbLwLcxsSYFq6QbU+IvFghx/7TTobj19dSO0BeIO2nAreJ55NpGYe7W66FlzlOOR1N5Ms38Qi7wwQy4wNiCEtBcMFW4Cp/YyXEv3NbJ1owjQ1hVCP7IvS+M2doBIjQtbTmR9VQigwGBi76VWx0z4uu2PL98iil8Jr9j231FesNG58+z9C/yvNNimPg3rzAKt05fyDgRTFm+OKkdIAOa/PTCKXTaMLwGwNuh7HYuLiZeIDBytHGx7F/iq/gbJAjhFL+fRQBwA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=xDuHygYFY121d5CqPVsYnxL7KlQzMsQV8wsIxbglHhKZB3KkGdM4A/MkeCoaScaMOLSHeK8D/9dXSNIiFZPfDALq/7TLqTQqyO0QDYZ1tAkEO4AiAbszWgtiogg0HFIaKrj6MmBusDoiE3S5RhWQGCdA0qc2G6Q9VDgSaONlolXibUBWZnENImuWXLjyG+gm8Qi1mfZEDKGxEYhAsUIO8AHGzhnW1EUA7aLDoYE6bAOuI9yGTcZsriDJ5zbuntazd/WiWFvEOq+aeDsErVmH1JyJM7M8gGR9vYxeVKF+3ugSnp8pNSBRfG+cpg5W9rAWAZniGtKF5AinxAX5SbT4iQ==
  • Cc: <ray.huang@xxxxxxx>, Penny Zheng <Penny.Zheng@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Julien Grall" <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Tue, 27 May 2025 08:51:08 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

amd-cppc has 2 operation modes: autonomous (active) mode and
non-autonomous (passive) mode.
In active mode, we don't need Xen governor to calculate and tune the cpu
frequency, while hardware built-in CPPC power algorithm will calculate the
runtime workload and adjust cores frequency automatically according to the
power supply, thermal, core voltage and some other hardware conditions.

Users could set EPP register to tell hardware their preference ratio toward
performance over energy efficiency, which is configured on a scale from 0 to
255 where 0 represents maximum performance and 255 represents maximum
efficiency.

We implement a new AMD CPU frequency driver `amd-cppc-epp` for active mode.
It requires `active` tag for users to explicitly select active mode.
In driver `active-cppc-epp`, ->setpolicy() is hooked, not the ->target(), as
it does not depend on xen governor to do performance tuning.

Users could re-use "governor" in Xen cmdline to deliver which performance
policy they want to apply, represented by a new field "policy"
(CPUFREQ_POLICY_xxx), CPUFREQ_POLICY_PERFORMANCE as maximum performance,
CPUFREQ_POLICY_POWERSAVE as the least power consumption.
Function cpufreq_policy_from_governor() is responsible for tranforming
governor into policy. Right now, only "performance" and "powersave" are
considered.

Signed-off-by: Penny Zheng <Penny.Zheng@xxxxxxx>
---
v1 -> v2:
- Remove redundant epp_mode
- Remove pointless initializer
- Define sole caller read_epp_init_once and epp_init value to read
pre-defined BIOS epp value only once
- Combine the commit "xen/cpufreq: introduce policy type when
cpufreq_driver->setpolicy exists"
---
v2 -> v3:
- Combined with commit "x86/cpufreq: add "cpufreq=amd-cppc,active" para"
- Refactor doc about "active mode"
- Change opt_cpufreq_active to opt_active_mode
- Let caller pass epp_init when unspecified to allow the function parameter
to be of uint8_t
- Make epp_init per-cpu value
---
v3 -> v4:
- doc refinement
- use MASK_EXTR() to get epp value
- fix indentation
- replace if-else() with switch()
- combine successive comments and do refinement
- no need to introduce amd_cppc_epp_update_limit() as a wrapper
- rename cpufreq_parse_policy() with cpufreq_policy_from_governor()
- no need to use case-insensitive comparison
---
v4 -> v5:
- refine doc to state what the default is for "active" sub-option and it's of
boolean nature
- excess blank after << for AMD_CPPC_EPP_MASK
- set max_perf with lowest_perf to get utmost powersave
- refine commit message to include description about relation between "policy"
and "governor"
---
 docs/misc/xen-command-line.pandoc    |  10 ++-
 xen/arch/x86/acpi/cpufreq/amd-cppc.c | 123 ++++++++++++++++++++++++++-
 xen/arch/x86/include/asm/msr-index.h |   1 +
 xen/drivers/cpufreq/utility.c        |  11 +++
 xen/include/acpi/cpufreq/cpufreq.h   |  12 +++
 xen/include/public/sysctl.h          |   1 +
 6 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 9ef847a336..23518b2ae4 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, 
rather than detectin
 available support.
 
 ### cpufreq
-> `= none | {{ <boolean> | xen } { 
[:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } 
[,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
+> `= none | {{ <boolean> | xen } { 
[:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } 
[,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | 
amd-cppc[:[active][,verbose]]`
 
 > Default: `xen`
 
@@ -537,6 +537,14 @@ choice of `dom0-kernel` is deprecated and not supported by 
all Dom0 kernels.
 * `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
   on supported AMD hardware to provide finer grained frequency control
   mechanism. The default is disabled.
+* `active` is a boolean to enable amd-cppc driver in active(autonomous) mode.
+  In this mode, users could write to energy performance preference
+  register(epp) to tell hardware if they want to bias toward performance or
+  energy efficiency. Built-in CPPC power algorithm will calculate the runtime
+  workload and adjust cores frequency automatically according to the power
+  supply, thermal, core voltage and some other hardware conditions.
+  The default is disabled, and the option only applies when `amd-cppc` is
+  enabled.
 
 There is also support for `;`-separated fallback options:
 `cpufreq=hwp;xen,verbose`.  This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c 
b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 8f0a30c19d..5f15e86dc4 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -30,6 +30,9 @@
         printk(XENLOG_DEBUG "AMD_CPPC: CPU%u " fmt, cpu, ## args);  \
 })
 
+static bool __ro_after_init opt_active_mode;
+static DEFINE_PER_CPU_READ_MOSTLY(uint8_t, epp_init);
+
 struct amd_cppc_drv_data
 {
     const struct xen_processor_cppc *cppc_data;
@@ -76,6 +79,13 @@ static bool __init amd_cppc_handle_option(const char *s, 
const char *end)
         return true;
     }
 
+    ret = parse_boolean("active", s, end);
+    if ( ret >= 0 )
+    {
+        opt_active_mode = ret;
+        return true;
+    }
+
     return false;
 }
 
@@ -224,11 +234,18 @@ static void cf_check amd_cppc_write_request_msrs(void 
*info)
 }
 
 static void amd_cppc_write_request(unsigned int cpu, uint8_t min_perf,
-                                   uint8_t des_perf, uint8_t max_perf)
+                                   uint8_t des_perf, uint8_t max_perf,
+                                   uint8_t epp)
 {
     struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
     uint64_t prev = data->req.raw;
 
+    if ( !opt_active_mode )
+        data->req.des_perf = des_perf;
+    else
+        data->req.des_perf = 0;
+    data->req.epp = epp;
+
     data->req.min_perf = min_perf;
     data->req.max_perf = max_perf;
     data->req.des_perf = des_perf;
@@ -239,6 +256,14 @@ static void amd_cppc_write_request(unsigned int cpu, 
uint8_t min_perf,
     on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, data, 1);
 }
 
+static void read_epp_init(void)
+{
+    uint64_t val;
+
+    rdmsrl(MSR_AMD_CPPC_REQ, val);
+    this_cpu(epp_init) = MASK_EXTR(val, AMD_CPPC_EPP_MASK);
+}
+
 static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy *policy,
                                             unsigned int target_freq,
                                             unsigned int relation)
@@ -260,7 +285,10 @@ static int cf_check amd_cppc_cpufreq_target(struct 
cpufreq_policy *policy,
      * performance in P-state range.
      */
     amd_cppc_write_request(policy->cpu, data->caps.lowest_nonlinear_perf,
-                           des_perf, data->caps.highest_perf);
+                           des_perf, data->caps.highest_perf,
+                           /* Pre-defined BIOS value for passive mode */
+                           per_cpu(epp_init, policy->cpu));
+
     return 0;
 }
 
@@ -336,6 +364,8 @@ static void cf_check amd_cppc_init_msrs(void *info)
      */
     policy->cur = cpufreq_driver_getavg(policy->cpu, GOV_GETAVG);
 
+    read_epp_init();
+
     return;
 
  err:
@@ -369,7 +399,7 @@ static int cf_check amd_cppc_cpufreq_cpu_exit(struct 
cpufreq_policy *policy)
     return 0;
 }
 
-static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+static int amd_cppc_cpufreq_init_perf(struct cpufreq_policy *policy)
 {
     unsigned int cpu = policy->cpu;
     struct amd_cppc_drv_data *data;
@@ -402,12 +432,80 @@ static int cf_check amd_cppc_cpufreq_cpu_init(struct 
cpufreq_policy *policy)
 
     amd_cppc_boost_init(policy, data);
 
+    return 0;
+}
+
+static int cf_check amd_cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+    int ret;
+
+    ret = amd_cppc_cpufreq_init_perf(policy);
+    if ( ret )
+        return ret;
+
     amd_cppc_verbose(policy->cpu,
                      "CPU initialized with amd-cppc passive mode\n");
 
     return 0;
 }
 
+static int cf_check amd_cppc_epp_cpu_init(struct cpufreq_policy *policy)
+{
+    int ret;
+
+    ret = amd_cppc_cpufreq_init_perf(policy);
+    if ( ret )
+        return ret;
+
+    policy->policy = cpufreq_policy_from_governor(policy->governor);
+
+    amd_cppc_verbose(policy->cpu,
+                     "CPU initialized with amd-cppc active mode\n");
+
+    return 0;
+}
+
+static int cf_check amd_cppc_epp_set_policy(struct cpufreq_policy *policy)
+{
+    const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data,
+                                                   policy->cpu);
+    uint8_t max_perf, min_perf, epp;
+
+    /*
+     * min_perf represents the idle frequency, while max_perf represents
+     * the maximum frequency
+     */
+    max_perf = data->caps.highest_perf;
+    min_perf = data->caps.lowest_perf;
+
+    /*
+     * We set min_perf with highest_perf in performance mode
+     * and max_perf with lowest_perf in powersave mode, to achieve
+     * ultmost performance and power savings.
+     */
+    switch ( policy->policy )
+    {
+    case CPUFREQ_POLICY_PERFORMANCE:
+        /* Force the epp value to be zero for performance policy */
+        epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+        min_perf = data->caps.highest_perf;
+        break;
+    case CPUFREQ_POLICY_POWERSAVE:
+        /* Force the epp value to be 0xff for powersave policy */
+        epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+        max_perf = data->caps.lowest_perf;
+        break;
+    default:
+        epp = per_cpu(epp_init, policy->cpu);
+        break;
+    }
+
+    amd_cppc_write_request(policy->cpu, min_perf,
+                           0 /* no des_perf for epp mode */,
+                           max_perf, epp);
+    return 0;
+}
+
 static const struct cpufreq_driver __initconst_cf_clobber
 amd_cppc_cpufreq_driver =
 {
@@ -418,13 +516,30 @@ amd_cppc_cpufreq_driver =
     .exit   = amd_cppc_cpufreq_cpu_exit,
 };
 
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_epp_driver =
+{
+    .name       = XEN_AMD_CPPC_EPP_DRIVER_NAME,
+    .verify     = amd_cppc_cpufreq_verify,
+    .setpolicy  = amd_cppc_epp_set_policy,
+    .init       = amd_cppc_epp_cpu_init,
+    .exit       = amd_cppc_cpufreq_cpu_exit,
+};
+
 int __init amd_cppc_register_driver(void)
 {
+    int ret;
+
     if ( !cpu_has_cppc )
     {
         xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
         return -ENODEV;
     }
 
-    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+    if ( opt_active_mode )
+        ret = cpufreq_register_driver(&amd_cppc_epp_driver);
+    else
+        ret = cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+
+    return ret;
 }
diff --git a/xen/arch/x86/include/asm/msr-index.h 
b/xen/arch/x86/include/asm/msr-index.h
index e4401186c7..3e2522a862 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -242,6 +242,7 @@
 #define MSR_AMD_CPPC_ENABLE                 0xc00102b1U
 #define  AMD_CPPC_ENABLE                    (_AC(1, ULL) << 0)
 #define MSR_AMD_CPPC_REQ                    0xc00102b3U
+#define  AMD_CPPC_EPP_MASK                  (_AC(0xff, ULL) << 24)
 
 /*
  * Legacy MSR constants in need of cleanup.  No new MSRs below this comment.
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index b35e2eb1b6..2617581125 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -487,3 +487,14 @@ int __cpufreq_set_policy(struct cpufreq_policy *data,
 
     return __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
 }
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov)
+{
+    if ( !strncmp(gov->name, "performance", CPUFREQ_NAME_LEN) )
+        return CPUFREQ_POLICY_PERFORMANCE;
+
+    if ( !strncmp(gov->name, "powersave", CPUFREQ_NAME_LEN) )
+        return CPUFREQ_POLICY_POWERSAVE;
+
+    return CPUFREQ_POLICY_UNKNOWN;
+}
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 83050c58b2..6f31009e82 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -83,6 +83,7 @@ struct cpufreq_policy {
     int8_t              turbo;  /* tristate flag: 0 for unsupported
                                  * -1 for disable, 1 for enabled
                                  * See CPUFREQ_TURBO_* below for defines */
+    unsigned int        policy; /* CPUFREQ_POLICY_* */
 };
 DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy);
 
@@ -133,6 +134,17 @@ extern int cpufreq_register_governor(struct 
cpufreq_governor *governor);
 extern struct cpufreq_governor *__find_governor(const char *governor);
 #define CPUFREQ_DEFAULT_GOVERNOR &cpufreq_gov_dbs
 
+#define CPUFREQ_POLICY_UNKNOWN      0
+/*
+ * If cpufreq_driver->target() exists, the ->governor decides what frequency
+ * within the limits is used. If cpufreq_driver->setpolicy() exists, these
+ * two generic policies are available:
+ */
+#define CPUFREQ_POLICY_POWERSAVE    1
+#define CPUFREQ_POLICY_PERFORMANCE  2
+
+unsigned int cpufreq_policy_from_governor(const struct cpufreq_governor *gov);
+
 /* pass a target to the cpufreq driver */
 extern int __cpufreq_driver_target(struct cpufreq_policy *policy,
                                    unsigned int target_freq,
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 42997252ef..fa431fd983 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -424,6 +424,7 @@ struct xen_set_cppc_para {
 };
 
 #define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
+#define XEN_AMD_CPPC_EPP_DRIVER_NAME "amd-cppc-epp"
 #define XEN_HWP_DRIVER_NAME "hwp"
 
 /*
-- 
2.34.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.