|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC XEN PATCH] x86/mwait-idle: Use ACPI for CPUs without hardcoded C-state table
On 18.07.2023 13:04, Simon Gaiser wrote:
> mwait-idle includes a hardcoded config for many CPUs. But some are
> missing, for example Tiger Lake. Linux' driver reads the config from
> ACPI in those cases. This adds this to Xen's implementation.
>
> The Linux driver also has a feature to combine the internal table with
> the infos from ACPI. This is not implemented here, for CPUs with
> internal config nothing is changed.
>
> Signed-off-by: Simon Gaiser <simon@xxxxxxxxxxxxxxxxxxxxxx>
> ---
>
> Sending this as RFC to get feedback on implementing it this way. I tried
> to keep this change small and to avoid changing the existing code path
> for CPUs with a config included in the driver. On the other hand this
> makes it look a bit hacky.
>
> I'm not quite sure if initializing mwait-idle in set_cx_pminfo this way
> is correct. For example set_cx has some smp_wmb call I'm not sure when
> it's needed, so might be very well missing something.
>
> What also surprised me is that the existing code in mwait-idle first
> calls cpuidle_current_governor->enable(processor_powers[cpu]) and later
> setup the C-state config in processor_powers[cpu]. This seems the be the
> wrong order (but, I think, current not important since
> menu_enable_device doesn't use that part of the struct).
>
> When I brought up the topic of this patch the first time Jan had an
> interesting questions [1]:
>
>> It hasn't become clear to me why Linux now has two CPU idle drivers
>> consuming ACPI data (intel_idle and the purely ACPI-based one).
>
> I'm not quite sure either. Linux' intel_idle.c states:
>
> intel_idle is a cpuidle driver that loads on all Intel CPUs with
> MWAIT in lieu of the legacy ACPI processor_idle driver. The intent
> is to make Linux more efficient on these processors, as intel_idle
> knows more than ACPI, as well as make Linux more immune to ACPI BIOS
> bugs.
>
> The commit that first added ACPI support to the Linux driver [2] says:
>
> The main functional difference between intel_idle with this change
> and the ACPI processor driver is that the former sets the target
> residency to be equal to the exit latency (provided by _CST) for
> C1-type C-states and to 3 times the exit latency value for the other
> C-state types, whereas the latter obtains the target residency by
> multiplying the exit latency by the same number (2 by default) for
> all C-state types. Therefore it is expected that in general using
> the former instead of the latter on the same system will lead to
> improved energy-efficiency.
>
> This sounds less impressive and doesn't explain why not to just change
> the standard ACPI driver to use the better latency settings. On the
> Linux what might play also a role is that the mwait driver also gained
> the option to combine the internal settings with what it reads from
> ACPI. That would be probably harder to include in the generic ACPI
> driver.
>
> This also raises the option to change the latency setting in Xen's
> generic ACPI driver for the affected CPUs instead of touching
> mwait-idle.
>
> Currently I'm interested in this driver mainly for S0ix support. I did
> nearly all my testing while using the mwait-idle driver to keep
> differences to Linux as small as possible. (At first by hacking in some
> config for the Tiger Lake CPU of my test system. Now with this patch.).
> At some point I observed worse S0ix residency with Xen's generic ACPI
> idle driver than with mwait-idle. But when I tried to reproduce this for
> writing this e-mail I wasn't able to reproduce this measurement and got
> the same residency for both idle drivers. So either I did messed up my
> previous measurements or I have some unaccounted changes in my test
> setup.
Taking all together perhaps more of an argument to see about making
the ACPI driver better. If with the same ACPI data the mwait-idle
one can (maybe) achieve better results, surely there's a way for the
ACPI driver to achieve the same, likely with less of a change?
Therefore only a few basic comments.
> @@ -1360,24 +1381,27 @@ long set_cx_pminfo(uint32_t acpi_id, struct
> xen_processor_power *power)
> set_cx(acpi_power, &xen_cx);
> }
>
> - if ( !cpu_online(cpu_id) )
> - {
> - uint32_t apic_id = x86_cpu_to_apicid[cpu_id];
> -
> - /*
> - * If we've just learned of more available C states, wake the CPU if
> - * it's parked, so it can go back to sleep in perhaps a deeper state.
> - */
> - if ( park_offline_cpus && apic_id != BAD_APICID )
> - {
> - unsigned long flags;
> -
> - local_irq_save(flags);
> - apic_wait_icr_idle();
> - apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id);
> - local_irq_restore(flags);
> + if ( cpu_id == 0 && pm_idle_save == NULL ) {
> + /* Now that we have the ACPI info from dom0, try again to setup
> + * mwait-idle*/
> + ret = mwait_idle_init(&cpu_nfb, true);
> + if (ret >= 0) {
> + unsigned int cpu;
> + /* mwait-idle took over, call it's initializer for all CPUs*/
> + for_each_present_cpu ( cpu )
> + {
> + cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, (void
> *)(long)cpu);
> + cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, (void
> *)(long)cpu);
> + if ( !cpu_online(cpu) ) {
> + repark_cpu(cpu);
> + }
> + }
Is this safe against a CPU coming online or going offline?
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -71,13 +71,15 @@
> #undef PREFIX
> #define PREFIX "mwait-idle: "
>
> +#define pr_err(fmt...) printk(KERN_ERR fmt)
> +
> #ifdef DEBUG
> # define pr_debug(fmt...) printk(KERN_DEBUG fmt)
> #else
> # define pr_debug(fmt...)
> #endif
>
> -static __initdata bool opt_mwait_idle = true;
> +static bool opt_mwait_idle = true;
> boolean_param("mwait-idle", opt_mwait_idle);
The variable still isn't written post-boot, is it? If so, you
want to use __ro_after_init rather than dropping the placement
attribute altogether.
> @@ -92,7 +94,7 @@ static unsigned int mwait_substates;
> * exclusive C-states, this parameter has no effect.
> */
> static unsigned int __ro_after_init preferred_states_mask;
> -static char __initdata preferred_states[64];
> +static char preferred_states[64];
> string_param("preferred-cstates", preferred_states);
Same here, I suppose.
> @@ -1151,6 +1153,9 @@ static const struct idle_cpu idle_cpu_snr = {
> .c1e_promotion = C1E_PROMOTION_DISABLE,
> };
>
> +static struct idle_cpu __read_mostly idle_cpu_acpi = {
> +};
No need for an (empty) initializer.
There are also numerous style issues throughout the patch, which would
want correcting in case there's a convincing argument towards going
this route.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |