[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 27 Apr 2022 17:06:01 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dfNP0m9DqoUThoqo7gSB3LAmxV++NOd1on2h1xZxePM=; b=VGZiVyImU4G1mG7f0TcXfTZEcyuD28kx+nKtpsgdePuUi/4XTG4FYz5zZKvtqYzHY0fIZyy/DGc7vVzQj9DFZ2mbxxiUteTnH97FOdfqRTDR/c/+yzEPVlxkh1q2yckPwKos84g+oixSCfg+IyqLKZaPeS7gZ+I8bBaVe9w4y29hY2FYeg4+I1mzNe0FWfH87QfEHRO4pYQRx+5OGwuFGk8+gg2jl4KG3BUpJcG8djaormt0FNuUzfva6RvBjJZaSwxouY9CCbZQTmW+YZQ1bpcQs3dY/jNv+X1CLsQ6rQ+lI5DlDIA9E1B76oRrdKP/27rGf4maSC0sZKQSuv2b8A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kNrAxurjs4jzFgWet+FieXyE/uMQsKoGjP9/K6Z52K671LNk5w0pmW9aecKER5ywR1q2TFGpc7LqWVP7jt9uxFzhS1YymczfJc/pTVw7o4JQanI5IA5Fn+emJmVKTZjCEbnb9KP3KO+26m4KN1r5ZXtOMnMkbyAMAWQTlvJ1rSLHmjrBiq9LNTsZ7ow4zurxnFJ7AGfyZS2DlHwJ1PKiK5VE692xwNLLQeBjrRQvlkD7PTvGQT/Q77jCxJqjHEkvznqQMbNdH0BBxtxXSqX6suojd343mfCO45F4pZwBvfbiNxezXR2i0BjmJAdeasDQcWxOtYtximzMI+mXWQp0LA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 27 Apr 2022 15:06:36 +0000
  • Ironport-data: A9a23:8mKp6KjSbrHlePnnE+EwhjnvX161FBEKZh0ujC45NGQN5FlHY01je htvCjrQbviJYGb2LY1yYISx8k0F7JGGytRlHFRsqStjEy0b9cadCdqndUqhZCn6wu8v7a5EA 2fyTvGacajYm1eF/k/F3oDJ9CU6jefSLlbFILas1hpZHGeIcw98z0M78wIFqtQw24LhX1nS4 YqaT/D3YzdJ5RYlagr41IrbwP9flKyaVOQw5wFWiVhj5TcyplFNZH4tDfjZw0jQG+G4KtWSV efbpIxVy0uCl/sb5nFJpZ6gGqECaua60QFjERO6UYD66vRJjnRaPqrWqJPwwKqY4tmEt4kZ9 TlDiXC/YScMbq+ThNxMaj1RNxBVF6JH04D9Hnfq5KR/z2WeG5ft69NHKRhueKc+paNwC2wI8 uEEIjcQaBzFn/ix3L+wVuhrgIIkMdXvO4Qc/HpnyFk1D95/GcyFH/qMuocehW9v7ixNNa+2i 84xcz1gYQ6GexRSElwWFIg/jKGjgXyXnzhw9wrN+PJovDG7IApZyuTuFYr1fISzbMhRj1qXq D+B/jvDO0RPXDCY4X/fmp62vcfNly7mXIMZFJWj6+VnxlaUwwQ7GBAQEFe2v/S9okq/QM5Eb VwZ/DI0qqo//1DtScPyNzWnpFaUsxhaXMBfe9DW8ymIw6vQpgyfWW4NS2cZbMR87ZdpAzs3y lWOgtXlQyR1t6GYQm6c8bHSqi6uPS8SLikJYipsoRY53uQPabob1nrnJuuP2obs5jEpMVkcG wy3kRU=
  • Ironport-hdrordr: A9a23:8NcYhKln22SkJRgsjPc87vXu6LHpDfO+imdD5ihNYBxZY6Wkfp +V8cjzhCWftN9OYhodcLC7V5Voj0mskKKdxbNhRYtKOzOWw1dATbsSlLcKpgeNJ8SQzI5gPM tbAstD4ZjLfCJHZKXBkXaF+rQbsb66GcmT7I+xrkuFDzsaDZ2Ihz0JdjpzeXcGIDWua6BJdq Z1saF81kedkDksH42GL0hAe9KGi8zAlZrgbxJDLxk76DOWhTftzLLhCRCX0joXTjsKmN4ZgC P4uj28wp/mn+Cwyxfa2WOWx5NKmOH5wt8GIMCXkMAaJhjllw7tToV8XL+puiwzvYiUmR4Xue iJhy1lE9V46nvXcG3wiRzx2zP42DJr0HPmwU/wuwqWneXJABYBT+ZRj4NQdRXUr2A6ustn7a 5N12WF87JKEBLphk3Glpf1fiAvsnDxjWspkOYVgXAae5AZcqVtoYsW+14QOIscHRj99JssHI BVfY3hDc5tABKnhk3izylSKITGZAVxIv7GeDlOhiWt6UkZoJgjpHFohvD2nR87hecAotd/lq H5259T5cBzp/8tHNxA7dg6MLuK40z2MGXx2TGpUCLa/J9uAQO/l7fHpJMI2cqNRLskiLMPpb WpaiIriYd1QTOlNfGz
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> On 27.04.2022 14:45, Roger Pau Monné wrote:
> > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >> From: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> >>
> >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> >> exclusive - only one of them can be enabled. By default, 'intel_idle' 
> >> driver
> >> enables C1 and disables C1E. However, some users prefer to use C1E instead 
> >> of
> >> C1, because it saves more energy.
> >>
> >> This patch adds a new module parameter ('preferred_cstates') for enabling 
> >> C1E
> >> and disabling C1. Here is the idea behind it.
> >>
> >> 1. This option has effect only for "mutually exclusive" C-states like C1 
> >> and
> >>    C1E on SPR.
> >> 2. It does not have any effect on independent C-states, which do not 
> >> require
> >>    other C-states to be disabled (most states on most platforms as of 
> >> today).
> >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
> >>    reasonable default, such as enabling C1 on SPR by default. On other
> >>    platforms, the default may be different.
> >> 4. Users can override the default using the 'preferred_cstates' parameter.
> >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
> >>    existing 'states_off' parameter.
> >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
> >>    other mutually exclusive C-states, if they come in the future.
> >>
> >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in 
> >> order
> >> to disable C1 and enable C1E, users should boot with the following kernel
> >> argument: intel_idle.preferred_cstates=4
> >>
> >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> >> da0e58c038e6
> >>
> >> Enable C1E (if requested) not only on the BSP's socket / package.
> > 
> > Maybe we should also add a note here that the command line option for
> > Xen is preferred-cstates instead of intel_idle.preferred_cstates?
> > 
> > I think this is a bad interface however, we should have a more generic
> > option (ie: cstate-mode = 'performance | powersave') so that users
> > don't have to fiddle with model specific C state masks.
> 
> Performance vs powersave doesn't cover it imo, especially if down
> the road more states would appear which can be controlled this way.
> I don't think there's a way around providing _some_ way to control
> things one a per-state level. When porting this over, I too didn't
> like this interface very much, but I had no good replacement idea.

I think it's fine to have this more fine grained control of C states,
but it doesn't seem practical from a user (or distro) PoV.  But then I
also wonder how much of a difference this will make regarding power
consumption.

> >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>  
> >>  static unsigned int mwait_substates;
> >>  
> >> +/*
> >> + * Some platforms come with mutually exclusive C-states, so that if one is
> >> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >> + * Sapphire Rapids platform. This parameter allows for selecting the
> >> + * preferred C-states among the groups of mutually exclusive C-states - 
> >> the
> >> + * selected C-states will be registered, the other C-states from the 
> >> mutually
> >> + * exclusive group won't be registered. If the platform has no mutually
> >> + * exclusive C-states, this parameter has no effect.
> >> + */
> >> +static unsigned int __ro_after_init preferred_states_mask;
> >> +integer_param("preferred-cstates", preferred_states_mask);
> >> +
> >>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>  static unsigned int lapic_timer_reliable_states = (1 << 1);
> >> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>    unsigned long auto_demotion_disable_flags;
> >>    bool byt_auto_demotion_disable_flag;
> >>    bool disable_promotion_to_c1e;
> >> +  bool enable_promotion_to_c1e;
> > 
> > I'm confused by those fields, shouldn't we just have:
> > promotion_to_c1e = true | false?
> > 
> > As one field is the negation of the other:
> > enable_promotion_to_c1e = !disable_promotion_to_c1e
> > 
> > I know this is code from Linux, but would like to understand why two
> > fields are needed.
> 
> This really is a tristate; Linux is now changing their global variable
> to an enum, but we don't have an equivalent of that global variable.

So it would be: leave default, disable C1E promotion, enable C1E
promotion.

And Linux is leaving the {disable,enable}_promotion_to_c1e in
idle_cpu?

I guess there's not much we can do unless we want to diverge from
upstream.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.