[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen staging] x86/mwait-idle: Adjust the SKX C6 parameters if PC6 is disabled



commit d4fb2cc5c0eea6873e4ffb0eeea6ea1d6023b654
Author:     Chen Yu <yu.c.chen@xxxxxxxxx>
AuthorDate: Wed Jan 19 13:50:43 2022 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Wed Jan 19 13:50:43 2022 +0100

    x86/mwait-idle: Adjust the SKX C6 parameters if PC6 is disabled
    
    Because cpuidle assumes worst-case C-state parameters, PC6 parameters
    are used for describing C6, which is worst-case for requesting CC6.
    When PC6 is enabled, this is appropriate. But if PC6 is disabled
    in the BIOS, the exit latency and target residency should be adjusted
    accordingly.
    
    Exit latency:
    Previously the C6 exit latency was measured as the PC6 exit latency.
    With PC6 disabled, the C6 exit latency should be the one of CC6.
    
    Target residency:
    With PC6 disabled, the idle duration within [CC6, PC6) would make the
    idle governor choose C1E over C6. This would cause low energy-efficiency.
    We should lower the bar to request C6 when PC6 is disabled.
    
    To fill this gap, check if PC6 is disabled in the BIOS in the
    MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
    for C6 and set target_residency to 3 times of the new exit latency. [This
    is consistent with how intel_idle driver uses _CST to calculate the
    target_residency.] As a result, the OS would be more likely to choose C6
    over C1E when PC6 is disabled, which is reasonable, because if C6 is
    enabled, it implies that the user cares about energy, so choosing C6 more
    frequently makes sense.
    
    The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
    wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
    model number as SkX, but their CC6 exit latencies are similar to the SKX
    one, 96us and 89us respectively, so reuse the SKX value for them.
    
    There is a concern that it might be better to use a more generic approach
    instead of optimizing every platform. However, if the required code
    complexity and different PC6 bit interpretation on different platforms
    are taken into account, tuning the code per platform seems to be an
    acceptable tradeoff.
    
    Link: https://intel.github.io/wult/ # [1]
    Suggested-by: Len Brown <len.brown@xxxxxxxxx>
    Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
    Reviewed-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
    [ rjw: Subject and changelog edits ]
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
    [Linux commit: 64233338499126c5c31e07165735ab5441c7e45a]
    
    Alongside the dropping of "const" from skx_cstates[] add __read_mostly,
    and extend that to other similar non-const tables.
    
    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
 xen/arch/x86/cpu/mwait-idle.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 97e19246b6..9e04d969d6 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -484,7 +484,7 @@ static const struct cpuidle_state bdw_cstates[] = {
        {}
 };
 
-static struct cpuidle_state skl_cstates[] = {
+static struct cpuidle_state __read_mostly skl_cstates[] = {
        {
                .name = "C1-SKL",
                .flags = MWAIT2flg(0x00),
@@ -536,7 +536,7 @@ static struct cpuidle_state skl_cstates[] = {
        {}
 };
 
-static const struct cpuidle_state skx_cstates[] = {
+static struct cpuidle_state __read_mostly skx_cstates[] = {
        {
                .name = "C1-SKX",
                .flags = MWAIT2flg(0x00),
@@ -674,7 +674,7 @@ static const struct cpuidle_state knl_cstates[] = {
        {}
 };
 
-static struct cpuidle_state bxt_cstates[] = {
+static struct cpuidle_state __read_mostly bxt_cstates[] = {
        {
                .name = "C1-BXT",
                .flags = MWAIT2flg(0x00),
@@ -1129,6 +1129,36 @@ static void __init sklh_idle_state_table_update(void)
        skl_cstates[6].flags |= CPUIDLE_FLAG_DISABLED;  /* C9-SKL */
 }
 
+/*
+ * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake
+ * idle states table.
+ */
+static void __init skx_idle_state_table_update(void)
+{
+       unsigned long long msr;
+
+       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
+
+       /*
+        * 000b: C0/C1 (no package C-state support)
+        * 001b: C2
+        * 010b: C6 (non-retention)
+        * 011b: C6 (retention)
+        * 111b: No Package C state limits.
+        */
+       if ((msr & 0x7) < 2) {
+               /*
+                * Uses the CC6 + PC0 latency and 3 times of
+                * latency for target_residency if the PC6
+                * is disabled in BIOS. This is consistent
+                * with how intel_idle driver uses _CST
+                * to set the target_residency.
+                */
+               skx_cstates[2].exit_latency = 92;
+               skx_cstates[2].target_residency = 276;
+       }
+}
+
 /*
  * mwait_idle_state_table_update()
  *
@@ -1147,6 +1177,9 @@ static void __init mwait_idle_state_table_update(void)
        case 0x5e: /* SKL-H */
                sklh_idle_state_table_update();
                break;
+       case 0x55: /* SKL-X */
+               skx_idle_state_table_update();
+               break;
        }
 }
 
--
generated by git-patchbot for /home/xen/git/xen.git#staging



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.