[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC XEN PATCH] x86/mwait-idle: Use ACPI for CPUs without hardcoded C-state table


  • To: Simon Gaiser <simon@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 18 Jul 2023 17:21:25 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uaB69kRt5wfhuIYifwZmcoRhtyVlFv1rZm0xo+VxziE=; b=KXhfP90DqYX7JYxBBYXqNqeTiIfJW382od55oZ1QqPZnxutUOeYIijAEaGLJTi8KAhNQoN28oam0AiQhHr/xb7kykuAGpA2kIrv9+5T7wqQ3jqPjL8Zg2uQ1FmpNRxpoF3Po3zdHbN3TFgLcwhCp6H7sCUPvimZA2vcBEX8m+F4heTOCk+zV4o7qXAWlj2W/pm2Ni1OY8T3yen+dHhAvPKtObE8uDiKhDLkRajAY5BxrjX6VZAlsqrkPggluaU5LiceBoF4jItKjnTYvmFoihUCrV4ocUTlv8PLZazWADtC4zTcMTU6+sxpmOWmSIoQ9akknT+6yXVwJgGZaG/SiLQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CoVYgoBBYtoQ5RUq34+BDhixwwsFn+5e+Y8Ds13zX5XaFGS/nq+0sE/JMI4X7vALbdrytjrRVNUNI97ujWZ4YP9VTKfxrjB3U4JyUvExzakhe6GybN9CQOUgE5fSGfo2zEPOHWHC/gKh0Dy+3CRE9aINgmNPeqRcUuHBJGOJpTF9pO6JMVkyPgKlUY4G1+H9hn1w3n+zi6HZnJs/Ku5W0zYZjmMDnSu2fGGWdzUD2WYcVys+TP3IBpjeeZo59R02SPY3KlhAAqP6HD5qB0TV09Ewp8RpNgrokq30kVDdiuLs/jGJLYtR7jdnHKFGFeb88PGV3u+113kbJE6yUyvKGg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 18 Jul 2023 15:22:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.07.2023 13:04, Simon Gaiser wrote:
> mwait-idle includes a hardcoded config for many CPUs. But some are
> missing, for example Tiger Lake. Linux' driver reads the config from
> ACPI in those cases. This adds this to Xen's implementation.
> 
> The Linux driver also has a feature to combine the internal table with
> the infos from ACPI. This is not implemented here, for CPUs with
> internal config nothing is changed.
> 
> Signed-off-by: Simon Gaiser <simon@xxxxxxxxxxxxxxxxxxxxxx>
> ---
> 
> Sending this as RFC to get feedback on implementing it this way. I tried
> to keep this change small and to avoid changing the existing code path
> for CPUs with a config included in the driver. On the other hand this
> makes it look a bit hacky.
> 
> I'm not quite sure if initializing mwait-idle in set_cx_pminfo this way
> is correct. For example set_cx has some smp_wmb call I'm not sure when
> it's needed, so might be very well missing something.
> 
> What also surprised me is that the existing code in mwait-idle first
> calls cpuidle_current_governor->enable(processor_powers[cpu]) and later
> setup the C-state config in processor_powers[cpu]. This seems the be the
> wrong order (but, I think, current not important since
> menu_enable_device doesn't use that part of the struct).
> 
> When I brought up the topic of this patch the first time Jan had an
> interesting questions [1]:
> 
>> It hasn't become clear to me why Linux now has two CPU idle drivers
>> consuming ACPI data (intel_idle and the purely ACPI-based one).
> 
> I'm not quite sure either. Linux' intel_idle.c states:
> 
>     intel_idle is a cpuidle driver that loads on all Intel CPUs with
>     MWAIT in lieu of the legacy ACPI processor_idle driver.  The intent
>     is to make Linux more efficient on these processors, as intel_idle
>     knows more than ACPI, as well as make Linux more immune to ACPI BIOS
>     bugs.
> 
> The commit that first added ACPI support to the Linux driver [2] says:
> 
>     The main functional difference between intel_idle with this change
>     and the ACPI processor driver is that the former sets the target
>     residency to be equal to the exit latency (provided by _CST) for
>     C1-type C-states and to 3 times the exit latency value for the other
>     C-state types, whereas the latter obtains the target residency by
>     multiplying the exit latency by the same number (2 by default) for
>     all C-state types.  Therefore it is expected that in general using
>     the former instead of the latter on the same system will lead to
>     improved energy-efficiency.
> 
> This sounds less impressive and doesn't explain why not to just change
> the standard ACPI driver to use the better latency settings. On the
> Linux what might play also a role is that the mwait driver also gained
> the option to combine the internal settings with what it reads from
> ACPI. That would be probably harder to include in the generic ACPI
> driver.
> 
> This also raises the option to change the latency setting in Xen's
> generic ACPI driver for the affected CPUs instead of touching
> mwait-idle.
> 
> Currently I'm interested in this driver mainly for S0ix support. I did
> nearly all my testing while using the mwait-idle driver to keep
> differences to Linux as small as possible. (At first by hacking in some
> config for the Tiger Lake CPU of my test system. Now with this patch.).
> At some point I observed worse S0ix residency with Xen's generic ACPI
> idle driver than with mwait-idle. But when I tried to reproduce this for
> writing this e-mail I wasn't able to reproduce this measurement and got
> the same residency for both idle drivers. So either I did messed up my
> previous measurements or I have some unaccounted changes in my test
> setup.

Taking all together perhaps more of an argument to see about making
the ACPI driver better. If with the same ACPI data the mwait-idle
one can (maybe) achieve better results, surely there's a way for the
ACPI driver to achieve the same, likely with less of a change?

Therefore only a few basic comments.

> @@ -1360,24 +1381,27 @@ long set_cx_pminfo(uint32_t acpi_id, struct 
> xen_processor_power *power)
>          set_cx(acpi_power, &xen_cx);
>      }
>  
> -    if ( !cpu_online(cpu_id) )
> -    {
> -        uint32_t apic_id = x86_cpu_to_apicid[cpu_id];
> -
> -        /*
> -         * If we've just learned of more available C states, wake the CPU if
> -         * it's parked, so it can go back to sleep in perhaps a deeper state.
> -         */
> -        if ( park_offline_cpus && apic_id != BAD_APICID )
> -        {
> -            unsigned long flags;
> -
> -            local_irq_save(flags);
> -            apic_wait_icr_idle();
> -            apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id);
> -            local_irq_restore(flags);
> +    if ( cpu_id == 0 && pm_idle_save == NULL ) {
> +        /* Now that we have the ACPI info from dom0, try again to setup
> +         * mwait-idle*/
> +        ret = mwait_idle_init(&cpu_nfb, true);
> +        if (ret >= 0) {
> +            unsigned int cpu;
> +            /* mwait-idle took over, call it's initializer for all CPUs*/
> +            for_each_present_cpu ( cpu )
> +            {
> +                cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, (void 
> *)(long)cpu);
> +                cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, (void 
> *)(long)cpu);
> +                if ( !cpu_online(cpu) ) {
> +                    repark_cpu(cpu);
> +                }
> +            }

Is this safe against a CPU coming online or going offline?

> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -71,13 +71,15 @@
>  #undef PREFIX
>  #define PREFIX "mwait-idle: "
>  
> +#define pr_err(fmt...) printk(KERN_ERR fmt)
> +
>  #ifdef DEBUG
>  # define pr_debug(fmt...) printk(KERN_DEBUG fmt)
>  #else
>  # define pr_debug(fmt...)
>  #endif
>  
> -static __initdata bool opt_mwait_idle = true;
> +static bool opt_mwait_idle = true;
>  boolean_param("mwait-idle", opt_mwait_idle);

The variable still isn't written post-boot, is it? If so, you
want to use __ro_after_init rather than dropping the placement
attribute altogether.

> @@ -92,7 +94,7 @@ static unsigned int mwait_substates;
>   * exclusive C-states, this parameter has no effect.
>   */
>  static unsigned int __ro_after_init preferred_states_mask;
> -static char __initdata preferred_states[64];
> +static char preferred_states[64];
>  string_param("preferred-cstates", preferred_states);

Same here, I suppose.

> @@ -1151,6 +1153,9 @@ static const struct idle_cpu idle_cpu_snr = {
>       .c1e_promotion = C1E_PROMOTION_DISABLE,
>  };
>  
> +static struct idle_cpu __read_mostly idle_cpu_acpi = {
> +};

No need for an (empty) initializer.

There are also numerous style issues throughout the patch, which would
want correcting in case there's a convincing argument towards going
this route.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.