[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm



On Wed, Jul 02, 2025 at 03:41:19PM +0100, Andrew Cooper wrote:
> In order elide IPIs, we must be able to identify whether a target CPU is in
> MWAIT at the point it is woken up.  i.e. the store to wake it up must also
> identify the state.
> 
> Create a new in_mwait variable beside __softirq_pending, so we can use a
> CMPXCHG to set the softirq while also observing the status safely.  Implement
> an x86 version of arch_pend_softirq() which does this.
> 
> In mwait_idle_with_hints(), advertise in_mwait, with an explanation of
> precisely what it means.  X86_BUG_MONITOR can be accounted for simply by not
> advertising in_mwait.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Anthony PERARD <anthony.perard@xxxxxxxxxx>
> CC: Michal Orzel <michal.orzel@xxxxxxx>
> CC: Julien Grall <julien@xxxxxxx>
> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> 
> This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of all
> of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait").
> 
> In Linux, they're both in the same flags field, which adds complexity.  In
> Xen, __softirq_pending is already unsigned long for everything other than x86,
> so adding an arch-neutral "in_mwait" would need wider changes.
> ---
>  xen/arch/x86/acpi/cpu_idle.c       | 20 +++++++++++++++++-
>  xen/arch/x86/include/asm/hardirq.h | 14 +++++++++++-
>  xen/arch/x86/include/asm/softirq.h | 34 ++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> index caa0fef0b3b1..13040ef467a0 100644
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -439,7 +439,21 @@ __initcall(cpu_idle_key_init);
>  void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>  {
>      unsigned int cpu = smp_processor_id();
> -    const unsigned int *this_softirq_pending = &softirq_pending(cpu);
> +    irq_cpustat_t *stat = &irq_stat[cpu];
> +    const unsigned int *this_softirq_pending = &stat->__softirq_pending;
> +
> +    /*
> +     * By setting in_mwait, we promise to other CPUs that we'll notice 
> changes
> +     * to __softirq_pending without being sent an IPI.  We achieve this by
> +     * either not going to sleep, or by having hardware notice on our behalf.
> +     *
> +     * Some errata exist where MONITOR doesn't work properly, and the
> +     * workaround is to force the use of an IPI.  Cause this to happen by
> +     * simply not advertising outselves as being in_mwait.
> +     */
> +    alternative_io("movb $1, %[in_mwait]",
> +                   "", X86_BUG_MONITOR,
> +                   [in_mwait] "=m" (stat->in_mwait));
>  
>      monitor(this_softirq_pending, 0, 0);
>      smp_mb();
> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned 
> int ecx)
>          mwait(eax, ecx);
>          spec_ctrl_exit_idle(info);
>      }
> +
> +    alternative_io("movb $0, %[in_mwait]",
> +                   "", X86_BUG_MONITOR,
> +                   [in_mwait] "=m" (stat->in_mwait));

Isn't it a bit overkill to use alternatives here when you could have a
conditional based on a global variable to decide whether to set/clear
in_mwait?

>  }
>  
>  static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
> diff --git a/xen/arch/x86/include/asm/hardirq.h 
> b/xen/arch/x86/include/asm/hardirq.h
> index f3e93cc9b507..1647cff04dc8 100644
> --- a/xen/arch/x86/include/asm/hardirq.h
> +++ b/xen/arch/x86/include/asm/hardirq.h
> @@ -5,7 +5,19 @@
>  #include <xen/types.h>
>  
>  typedef struct {
> -    unsigned int __softirq_pending;
> +    /*
> +     * The layout is important.  Any CPU can set bits in __softirq_pending,
> +     * but in_mwait is a status bit owned by the CPU.  softirq_mwait_raw must
> +     * cover both, and must be in a single cacheline.
> +     */
> +    union {
> +        struct {
> +            unsigned int __softirq_pending;
> +            bool in_mwait;

Given the usage in assembly where you store 0 and 1, this might better
be a uint8_t then?

> +        };
> +        uint64_t softirq_mwait_raw;
> +    };

This could be a named union type ...

> +
>      unsigned int __local_irq_count;
>      unsigned int nmi_count;
>      unsigned int mce_count;
> diff --git a/xen/arch/x86/include/asm/softirq.h 
> b/xen/arch/x86/include/asm/softirq.h
> index e4b194f069fb..069e5716a68d 100644
> --- a/xen/arch/x86/include/asm/softirq.h
> +++ b/xen/arch/x86/include/asm/softirq.h
> @@ -1,6 +1,8 @@
>  #ifndef __ASM_SOFTIRQ_H__
>  #define __ASM_SOFTIRQ_H__
>  
> +#include <asm/system.h>
> +
>  #define NMI_SOFTIRQ            (NR_COMMON_SOFTIRQS + 0)
>  #define TIME_CALIBRATE_SOFTIRQ (NR_COMMON_SOFTIRQS + 1)
>  #define VCPU_KICK_SOFTIRQ      (NR_COMMON_SOFTIRQS + 2)
> @@ -9,4 +11,36 @@
>  #define HVM_DPCI_SOFTIRQ       (NR_COMMON_SOFTIRQS + 4)
>  #define NR_ARCH_SOFTIRQS       5
>  
> +/*
> + * Ensure softirq @nr is pending on @cpu.  Return true if an IPI can be
> + * skipped, false if the IPI cannot be skipped.
> + *
> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order to
> + * set softirq @nr while also observing in_mwait in a race-free way.
> + */
> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int 
> cpu)
> +{
> +    uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw;
> +    uint64_t old, new;

... so that you also use it here?  And the return check below could become
new.in_mwait instead of having to use a bitmask?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.