[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm
On 04/07/2025 8:24 am, Roger Pau Monné wrote: > On Thu, Jul 03, 2025 at 06:48:23PM +0100, Andrew Cooper wrote: >> On 03/07/2025 5:36 pm, Roger Pau Monné wrote: >>> On Wed, Jul 02, 2025 at 03:41:19PM +0100, Andrew Cooper wrote: >>>> In order elide IPIs, we must be able to identify whether a target CPU is in >>>> MWAIT at the point it is woken up. i.e. the store to wake it up must also >>>> identify the state. >>>> >>>> Create a new in_mwait variable beside __softirq_pending, so we can use a >>>> CMPXCHG to set the softirq while also observing the status safely. >>>> Implement >>>> an x86 version of arch_pend_softirq() which does this. >>>> >>>> In mwait_idle_with_hints(), advertise in_mwait, with an explanation of >>>> precisely what it means. X86_BUG_MONITOR can be accounted for simply by >>>> not >>>> advertising in_mwait. >>>> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>> --- >>>> CC: Jan Beulich <JBeulich@xxxxxxxx> >>>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> >>>> CC: Anthony PERARD <anthony.perard@xxxxxxxxxx> >>>> CC: Michal Orzel <michal.orzel@xxxxxxx> >>>> CC: Julien Grall <julien@xxxxxxx> >>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> >>>> >>>> This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of >>>> all >>>> of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait"). >>>> >>>> In Linux, they're both in the same flags field, which adds complexity. In >>>> Xen, __softirq_pending is already unsigned long for everything other than >>>> x86, >>>> so adding an arch-neutral "in_mwait" would need wider changes. >>>> --- >>>> xen/arch/x86/acpi/cpu_idle.c | 20 +++++++++++++++++- >>>> xen/arch/x86/include/asm/hardirq.h | 14 +++++++++++- >>>> xen/arch/x86/include/asm/softirq.h | 34 ++++++++++++++++++++++++++++++ >>>> 3 files changed, 66 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c >>>> index caa0fef0b3b1..13040ef467a0 100644 >>>> --- a/xen/arch/x86/acpi/cpu_idle.c >>>> +++ b/xen/arch/x86/acpi/cpu_idle.c >>>> @@ -439,7 +439,21 @@ __initcall(cpu_idle_key_init); >>>> void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) >>>> { >>>> unsigned int cpu = smp_processor_id(); >>>> - const unsigned int *this_softirq_pending = &softirq_pending(cpu); >>>> + irq_cpustat_t *stat = &irq_stat[cpu]; >>>> + const unsigned int *this_softirq_pending = &stat->__softirq_pending; >>>> + >>>> + /* >>>> + * By setting in_mwait, we promise to other CPUs that we'll notice >>>> changes >>>> + * to __softirq_pending without being sent an IPI. We achieve this by >>>> + * either not going to sleep, or by having hardware notice on our >>>> behalf. >>>> + * >>>> + * Some errata exist where MONITOR doesn't work properly, and the >>>> + * workaround is to force the use of an IPI. Cause this to happen by >>>> + * simply not advertising outselves as being in_mwait. >>>> + */ >>>> + alternative_io("movb $1, %[in_mwait]", >>>> + "", X86_BUG_MONITOR, >>>> + [in_mwait] "=m" (stat->in_mwait)); >>>> >>>> monitor(this_softirq_pending, 0, 0); >>>> smp_mb(); >>>> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned >>>> int ecx) >>>> mwait(eax, ecx); >>>> spec_ctrl_exit_idle(info); >>>> } >>>> + >>>> + alternative_io("movb $0, %[in_mwait]", >>>> + "", X86_BUG_MONITOR, >>>> + [in_mwait] "=m" (stat->in_mwait)); >>> Isn't it a bit overkill to use alternatives here when you could have a >>> conditional based on a global variable to decide whether to set/clear >>> in_mwait? >> I view it differently. Why should the common case suffer overhead >> (extra memory read and conditional branch) to work around 3 buggy pieces >> of hardware in a common path? > TBH I don't think the extra read and conditional would make any > difference in this specific path performance wise. Either the CPU is > going to sleep and has nothing to do, or the cost of getting back from > idle will shadow the overhead of an extra read and conditional. > > It's all a question of taste I guess. If it was me I would set/clear > stat->in_mwait unconditionally in mwait_idle_with_hints(), and then in > arch_pend_softirq() I would return: > > return new & (1UL << 32) || force_mwait_ipi_wakeup; > > Or AFAICT you could possibly skip the cmpxchg() in the > force_mwait_ipi_wakeup case in arch_pend_softirq(), and just do: > > if ( force_mwait_ipi_wakeup ) > return test_and_set_bit(nr, &softirq_pending(cpu)); > > Because in that case Xen doesn't care about the in_mwait status. It > would be an optimization for the buggy hardware that already has to > deal with the extra cost of always sending an IPI. These will both function, but they're both poor options. They're in a loop over a cpumask, and because of the full barriers in the atomic options, the read cannot be hoisted or the loop split, because the compiler has been told "the value may change on each loop iteration". This was a code-gen/perf note I had on your original errata fix, which I said "lets fix later". Later is now. The other part of the fix is arch_pend_softirq() is static inline, and not out-of-line in a separate TU. >>>> + }; >>>> + uint64_t softirq_mwait_raw; >>>> + }; >>> This could be a named union type ... >>> >>>> + >>>> unsigned int __local_irq_count; >>>> unsigned int nmi_count; >>>> unsigned int mce_count; >>>> diff --git a/xen/arch/x86/include/asm/softirq.h >>>> b/xen/arch/x86/include/asm/softirq.h >>>> index e4b194f069fb..069e5716a68d 100644 >>>> --- a/xen/arch/x86/include/asm/softirq.h >>>> +++ b/xen/arch/x86/include/asm/softirq.h >>>> @@ -1,6 +1,8 @@ >>>> #ifndef __ASM_SOFTIRQ_H__ >>>> #define __ASM_SOFTIRQ_H__ >>>> >>>> +#include <asm/system.h> >>>> + >>>> #define NMI_SOFTIRQ (NR_COMMON_SOFTIRQS + 0) >>>> #define TIME_CALIBRATE_SOFTIRQ (NR_COMMON_SOFTIRQS + 1) >>>> #define VCPU_KICK_SOFTIRQ (NR_COMMON_SOFTIRQS + 2) >>>> @@ -9,4 +11,36 @@ >>>> #define HVM_DPCI_SOFTIRQ (NR_COMMON_SOFTIRQS + 4) >>>> #define NR_ARCH_SOFTIRQS 5 >>>> >>>> +/* >>>> + * Ensure softirq @nr is pending on @cpu. Return true if an IPI can be >>>> + * skipped, false if the IPI cannot be skipped. >>>> + * >>>> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in >>>> order to >>>> + * set softirq @nr while also observing in_mwait in a race-free way. >>>> + */ >>>> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int >>>> cpu) >>>> +{ >>>> + uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw; >>>> + uint64_t old, new; >>> ... so that you also use it here? >> No, it cant. The of softirq_pending() in common code requires no >> intermediate field names, and I'm not untangling that mess in a series >> wanting backporting. > Oh, I see. Never mind then, something for a later cleanup if > anything. Yeah, I've got further cleanup pending, although I don't have a good fix for this specifically. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |