[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/5] x86/mwait-idle: add 'preferred-cstates' command line option


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 11 Oct 2022 18:22:11 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MAqLP0bsxCz0hHWLie1B+857fbmoIn68UrorEMA+1SU=; b=TvrFhTvGynMbUZFA8zAuGvS44NkGHiG0GuxFzMcCfyb/yFbASeMHX1OSaHkBL1bq2pJKJTkNNaJqtF8f8yTc02Em5/k3iS85VK/TZSlN3ZyAUWbWwC6qtWhDPU3zU2p9qC3heYCHfEYtxRxVrtK2u/W23c7VPPUKC2N10BSPH+dsI+1+U2t1owVcIz3iqz5xnyiYzkYAl7TvK6KRmLvcWQBOwJf/OGc6wmJfQfolfivUQjKHj7NblZbwXrQMuVQQE+gr0zvqckDHMfVobHlXemQP1r+NUzf1ezIOLHLesGnCo7E2UrcFdW8hDR8UP8VW3VTfyLHC7jpXi3wLx11XcQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fvCXc4Rc1wAV0srkO6xKEzsTN4wNzZktanKJ4ARks7hO0C3QaHkqQjT2ADe3OzxamnODEX66o9QWG8e1bgPSqKrkdxYu4KwwCh1zNFbHPWlkXzwMV2ko3QIIL3YHSTLv/VdUAuaZ+NJITOrJQQZeYUfIMO9sAbzUgcbu+kJQIhVhXkKCaAcHSNHjds7F6VyqdakTsmQAZ5v43LmgLZa/QG5Arvk6F7v5L0u2N/4ZbMNBz1htf683ITxTlJfBo1l9stgEjyn5ffVoGXdHFG16fH0RFVvzenFSBzua1LijdJPwjj/3WnqrXMHvfIDZpuh9hW8GBtkY147tj5j2iR51tQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 11 Oct 2022 16:22:31 +0000
  • Ironport-data: A9a23:n7CfxK2im0xOeqs2dvbD5ctwkn2cJEfYwER7XKvMYLTBsI5bpzMPy WtKXDqPPq7YY2X0eYx/PoznoxxXvJ7Sm9I3TApvpC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefSAOKU5NfsYkhZXRVjRDoqlSVtkus4hp8AqdWiCkaGt MiaT/f3YTdJ4BYpdDNJg06/gEk35q6r4GpB5gZWic1j5zcyqVFEVPrzGonpR5fIatE8NvK3Q e/F0Ia48gvxl/v6Ior4+lpTWhRiro/6ZWBiuFIPM0SRqkEqShgJ+rQ6LJIhhXJ/0F1lqTzTJ OJl7vRcQS9xVkHFdX90vxNwS0mSNoUekFPLzOTWXWV+ACQqflO1q8iCAn3aMqUZueJoBjBDr sBfJWkIMxuujbm87p2kH7wEasQLdKEHPas5k1Q4kXTzK6ZjRprOBaLX+dVfwTE8wNhUGurTb NYYbjwpawncZxpIOREcD5dWcOWA3yGjNWEH7g3J4/Zui4TQ5FUZPLzFKt3ad8bMXcxItk2Zu njH7yLyBRRy2Nm3mWDUqiPx2b+ncSXTW40uCuOG6qZT3R6hynE+Kj0rSlykiKzs4qK5c5cFQ 6AOwQIsp6Uv8E2gTvHmQga15nWDu3Y0S9dWVuE39gyJ4q7V+BqCQHgJSCZbb94rv9NwQiYlv neWm/v5CDopt6eaIU9x7Z+RpDK2fCITfWkLYHdYSRNfu4W65oYukhjIU9BvVravicH4Ei3xx DbMqzUig7IUjogA0KDTEU37vg9Ab6PhFmYdjjg7lEr8hu+lTOZJv7CV1GU=
  • Ironport-hdrordr: A9a23:0ZdiNq077q3gx4yqLrbUkAqjBVdyeYIsimQD101hICG9Lfb0qy n+pp4mPEHP4wr5OEtOpTlPAtjkfZr5z+8M3WBxB8baYOCCggeVxe5ZjbcKrweQeBEWs9Qtrp uIEJIOdOEYb2IK6voSiTPQe7hA/DDEytHPuQ639QYRcegAUdAF0+4WMHf4LqUgLzM2f6bRWa DskfZvln6FQzA6f867Dn4KU6zqoMDKrovvZVojCwQ84AeDoDu04PqieiLolys2Yndq+/MP4G LFmwv26uGKtOy68AbV0yv2445NkNXs59NfDIini9QTKB/rlgG0Db4RLYGqjXQQmqWC+VwqmN 7Dr1MJONly0WrYeiWPrR7ky2DboUQTwk6n7WXdrWrooMT/Sj5/IdFGn5hlfhzQ7FdllM1g0Y pQtljp+qZ/PFflpmDQ9tLIXxZlmg6funw5i9MeiHRZTM83dKJRl4oC50lYea1wVh4S0LpXX9 WGMfusqsq/KTihHjHkVyhUsZeRt00Ib1u7qhNogL3U79BU9EoJvHfwivZv3Uvoz6hNOqWs19 60TZiAq4s+MPP+TZgNcdvpEvHHflDlcFbrDF+4B2jBOeUuB0/twqSHkIndotvaMKA18A==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Aug 18, 2022 at 03:03:33PM +0200, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> 
> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
> enables C1 and disables C1E. However, some users prefer to use C1E instead of
> C1, because it saves more energy.
> 
> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
> and disabling C1. Here is the idea behind it.
> 
> 1. This option has effect only for "mutually exclusive" C-states like C1 and
>    C1E on SPR.
> 2. It does not have any effect on independent C-states, which do not require
>    other C-states to be disabled (most states on most platforms as of today).
> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
>    reasonable default, such as enabling C1 on SPR by default. On other
>    platforms, the default may be different.
> 4. Users can override the default using the 'preferred_cstates' parameter.
> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
>    existing 'states_off' parameter.
> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
>    other mutually exclusive C-states, if they come in the future.
> 
> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
> to disable C1 and enable C1E, users should boot with the following kernel
> argument: intel_idle.preferred_cstates=4
> 
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> da0e58c038e6
> 
> Enable C1E (if requested) not only on the BSP's socket / package. Alter
> command line option to fit our model, and extend it to also accept
> string form arguments.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> v2: Also accept string form arguments for command line option. Restore
>     C1E-control related enum from Linux, despite our somewhat different
>     use (and bigger code churn).
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1912,6 +1912,12 @@ paging controls access to usermode addre
>  ### ple_window (Intel)
>  > `= <integer>`
>  
> +### preferred-cstates (x86)
> +> `= ( <integer> | List of ( C1 | C1E | C2 | ... )`
> +
> +This is a mask of C-states which are to be used preferably.  This option is
> +applicable only on hardware were certain C-states are exclusive of one 
> another.
> +
>  ### psr (Intel)
>  > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | 
> cos_max:<integer> | cdp:<boolean> )`
>  
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -82,10 +82,29 @@ boolean_param("mwait-idle", opt_mwait_id
>  
>  static unsigned int mwait_substates;
>  
> +/*
> + * Some platforms come with mutually exclusive C-states, so that if one is
> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> + * Sapphire Rapids platform. This parameter allows for selecting the
> + * preferred C-states among the groups of mutually exclusive C-states - the
> + * selected C-states will be registered, the other C-states from the mutually
> + * exclusive group won't be registered. If the platform has no mutually
> + * exclusive C-states, this parameter has no effect.
> + */
> +static unsigned int __ro_after_init preferred_states_mask;
> +static char __initdata preferred_states[64];
> +string_param("preferred-cstates", preferred_states);
> +
>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>  static unsigned int lapic_timer_reliable_states = (1 << 1);
>  
> +enum c1e_promotion {
> +     C1E_PROMOTION_PRESERVE,
> +     C1E_PROMOTION_ENABLE,
> +     C1E_PROMOTION_DISABLE
> +};
> +
>  struct idle_cpu {
>       const struct cpuidle_state *state_table;
>  
> @@ -95,7 +114,7 @@ struct idle_cpu {
>        */
>       unsigned long auto_demotion_disable_flags;
>       bool byt_auto_demotion_disable_flag;
> -     bool disable_promotion_to_c1e;
> +     enum c1e_promotion c1e_promotion;
>  };
>  
>  static const struct idle_cpu *icpu;
> @@ -924,6 +943,15 @@ static void cf_check byt_auto_demotion_d
>       wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
>  }
>  
> +static void cf_check c1e_promotion_enable(void *dummy)
> +{
> +     uint64_t msr_bits;
> +
> +     rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
> +     msr_bits |= 0x2;
> +     wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
> +}
> +
>  static void cf_check c1e_promotion_disable(void *dummy)
>  {
>       u64 msr_bits;
> @@ -936,7 +964,7 @@ static void cf_check c1e_promotion_disab
>  static const struct idle_cpu idle_cpu_nehalem = {
>       .state_table = nehalem_cstates,
>       .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_atom = {
> @@ -954,64 +982,64 @@ static const struct idle_cpu idle_cpu_li
>  
>  static const struct idle_cpu idle_cpu_snb = {
>       .state_table = snb_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_byt = {
>       .state_table = byt_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>       .byt_auto_demotion_disable_flag = true,
>  };
>  
>  static const struct idle_cpu idle_cpu_cht = {
>       .state_table = cht_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>       .byt_auto_demotion_disable_flag = true,
>  };
>  
>  static const struct idle_cpu idle_cpu_ivb = {
>       .state_table = ivb_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_ivt = {
>       .state_table = ivt_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_hsw = {
>       .state_table = hsw_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_bdw = {
>       .state_table = bdw_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_skl = {
>       .state_table = skl_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_skx = {
>       .state_table = skx_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_icx = {
>         .state_table = icx_cstates,
> -       .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static struct idle_cpu __read_mostly idle_cpu_spr = {
>       .state_table = spr_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_avn = {
>       .state_table = avn_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_knl = {
> @@ -1020,17 +1048,17 @@ static const struct idle_cpu idle_cpu_kn
>  
>  static const struct idle_cpu idle_cpu_bxt = {
>       .state_table = bxt_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_dnv = {
>       .state_table = dnv_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  static const struct idle_cpu idle_cpu_snr = {
>       .state_table = snr_cstates,
> -     .disable_promotion_to_c1e = true,
> +     .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
>  #define ICPU(model, cpu) \
> @@ -1241,6 +1269,25 @@ static void __init skx_idle_state_table_
>  }
>  
>  /*
> + * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
> + */
> +static void __init spr_idle_state_table_update(void)
> +{
> +     /* Check if user prefers C1E over C1. */
> +     if (preferred_states_mask & BIT(2, U)) {
> +             if (preferred_states_mask & BIT(1, U))
> +                     /* Both can't be enabled, stick to the defaults. */
> +                     return;
> +
> +             spr_cstates[0].flags |= CPUIDLE_FLAG_DISABLED;
> +             spr_cstates[1].flags &= ~CPUIDLE_FLAG_DISABLED;
> +
> +             /* Request enabling C1E using the "C1E promotion" bit. */
> +             idle_cpu_spr.c1e_promotion = C1E_PROMOTION_ENABLE;
> +     }
> +}
> +
> +/*
>   * mwait_idle_state_table_update()
>   *
>   * Update the default state_table for this CPU-id
> @@ -1261,6 +1308,9 @@ static void __init mwait_idle_state_tabl
>       case INTEL_FAM6_SKYLAKE_X:
>               skx_idle_state_table_update();
>               break;
> +     case INTEL_FAM6_SAPPHIRERAPIDS_X:
> +             spr_idle_state_table_update();
> +             break;
>       }
>  }
>  
> @@ -1268,6 +1318,7 @@ static int __init mwait_idle_probe(void)
>  {
>       unsigned int eax, ebx, ecx;
>       const struct x86_cpu_id *id = x86_match_cpu(intel_idle_ids);
> +     const char *str;
>  
>       if (!id) {
>               pr_debug(PREFIX "does not run on family %d model %d\n",
> @@ -1309,6 +1360,39 @@ static int __init mwait_idle_probe(void)
>       pr_debug(PREFIX "lapic_timer_reliable_states %#x\n",
>                lapic_timer_reliable_states);
>  
> +     str = preferred_states;
> +     if (isdigit(str[0]))
> +             preferred_states_mask = simple_strtoul(str, &str, 0);
> +     else if (str[0])
> +     {
> +             const char *ss;
> +
> +             do {
> +                     const struct cpuidle_state *state = icpu->state_table;
> +                     unsigned int bit = 1;
> +
> +                     ss = strchr(str, ',');
> +                     if (!ss)
> +                             ss = strchr(str, '\0');
> +
> +                     for (; state->name[0]; ++state) {
> +                             bit <<= 1;
> +                             if (!cmdline_strcmp(str, state->name)) {
> +                                     preferred_states_mask |= bit;
> +                                     break;
> +                             }
> +                     }
> +                     if (!state->name[0])
> +                             break;
> +
> +                     str = ss + 1;
> +         } while (*ss);
> +
> +         str -= str == ss + 1;

I would add parentheses to the sum for clarity.

> +     }
> +     if (str[0])
> +             printk("unrecognized \"preferred-cstates=%s\"\n", str);
> +
>       mwait_idle_state_table_update();
>  
>       return 0;
> @@ -1400,8 +1484,18 @@ static int cf_check mwait_idle_cpu_init(
>       if (icpu->byt_auto_demotion_disable_flag)
>               on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, 
> NULL, 1);
>  
> -     if (icpu->disable_promotion_to_c1e)
> +     switch (icpu->c1e_promotion) {
> +     case C1E_PROMOTION_DISABLE:
>               on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 
> 1);
> +             break;
> +
> +     case C1E_PROMOTION_ENABLE:
> +             on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 
> 1);
> +             break;
> +
> +     case C1E_PROMOTION_PRESERVE:
> +             break;
> +     }

I find it kind of weird to user a notifier for this, won't it be
easier to set the C1E promotion as part of the CPU bringup process?

I see we also set other bits in the same way.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.