[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
From: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually exclusive - only one of them can be enabled. By default, 'intel_idle' driver enables C1 and disables C1E. However, some users prefer to use C1E instead of C1, because it saves more energy. This patch adds a new module parameter ('preferred_cstates') for enabling C1E and disabling C1. Here is the idea behind it. 1. This option has effect only for "mutually exclusive" C-states like C1 and C1E on SPR. 2. It does not have any effect on independent C-states, which do not require other C-states to be disabled (most states on most platforms as of today). 3. For mutually exclusive C-states, the 'intel_idle' driver always has a reasonable default, such as enabling C1 on SPR by default. On other platforms, the default may be different. 4. Users can override the default using the 'preferred_cstates' parameter. 5. The parameter accepts the preferred C-states bit-mask, similarly to the existing 'states_off' parameter. 6. This parameter is not limited to C1/C1E, and leaves room for supporting other mutually exclusive C-states, if they come in the future. Today 'intel_idle' can only be compiled-in, which means that on SPR, in order to disable C1 and enable C1E, users should boot with the following kernel argument: intel_idle.preferred_cstates=4 Signed-off-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6 Enable C1E (if requested) not only on the BSP's socket / package. Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> --- unstable.orig/docs/misc/xen-command-line.pandoc 2022-04-25 17:59:42.123387258 +0200 +++ unstable/docs/misc/xen-command-line.pandoc 2022-04-25 17:36:00.000000000 +0200 @@ -1884,6 +1884,12 @@ paging controls access to usermode addre ### ple_window (Intel) > `= <integer>` +### preferred-cstates (x86) +> `= <integer>` + +This is a mask of C-states which are to be use preferably. This option is +applicable only oh hardware were certain C-states are exlusive of one another. + ### psr (Intel) > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | > cos_max:<integer> | cdp:<boolean> )` --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:17:05.000000000 +0200 +++ unstable/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:33:47.000000000 +0200 @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id static unsigned int mwait_substates; +/* + * Some platforms come with mutually exclusive C-states, so that if one is + * enabled, the other C-states must not be used. Example: C1 and C1E on + * Sapphire Rapids platform. This parameter allows for selecting the + * preferred C-states among the groups of mutually exclusive C-states - the + * selected C-states will be registered, the other C-states from the mutually + * exclusive group won't be registered. If the platform has no mutually + * exclusive C-states, this parameter has no effect. + */ +static unsigned int __ro_after_init preferred_states_mask; +integer_param("preferred-cstates", preferred_states_mask); + #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ static unsigned int lapic_timer_reliable_states = (1 << 1); @@ -96,6 +108,7 @@ struct idle_cpu { unsigned long auto_demotion_disable_flags; bool byt_auto_demotion_disable_flag; bool disable_promotion_to_c1e; + bool enable_promotion_to_c1e; }; static const struct idle_cpu *icpu; @@ -924,6 +937,15 @@ static void cf_check byt_auto_demotion_d wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0); } +static void cf_check c1e_promotion_enable(void *dummy) +{ + uint64_t msr_bits; + + rdmsrl(MSR_IA32_POWER_CTL, msr_bits); + msr_bits |= 0x2; + wrmsrl(MSR_IA32_POWER_CTL, msr_bits); +} + static void cf_check c1e_promotion_disable(void *dummy) { u64 msr_bits; @@ -1241,6 +1263,26 @@ static void __init skx_idle_state_table_ } /* + * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table. + */ +static void __init spr_idle_state_table_update(void) +{ + /* Check if user prefers C1E over C1. */ + if (preferred_states_mask & BIT(2, U)) { + if (preferred_states_mask & BIT(1, U)) + /* Both can't be enabled, stick to the defaults. */ + return; + + spr_cstates[0].flags |= CPUIDLE_FLAG_DISABLED; + spr_cstates[1].flags &= ~CPUIDLE_FLAG_DISABLED; + + /* Request enabling C1E using the "C1E promotion" bit. */ + idle_cpu_spr.disable_promotion_to_c1e = false; + idle_cpu_spr.enable_promotion_to_c1e = true; + } +} + +/* * mwait_idle_state_table_update() * * Update the default state_table for this CPU-id @@ -1261,6 +1303,9 @@ static void __init mwait_idle_state_tabl case INTEL_FAM6_SKYLAKE_X: skx_idle_state_table_update(); break; + case INTEL_FAM6_SAPPHIRERAPIDS_X: + spr_idle_state_table_update(); + break; } } @@ -1402,6 +1447,8 @@ static int cf_check mwait_idle_cpu_init( if (icpu->disable_promotion_to_c1e) on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1); + else if (icpu->enable_promotion_to_c1e) + on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1); return NOTIFY_DONE; }
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |