[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm
On 02.07.2025 16:41, Andrew Cooper wrote: > In order elide IPIs, we must be able to identify whether a target CPU is in > MWAIT at the point it is woken up. i.e. the store to wake it up must also > identify the state. > > Create a new in_mwait variable beside __softirq_pending, so we can use a > CMPXCHG to set the softirq while also observing the status safely. Implement > an x86 version of arch_pend_softirq() which does this. > > In mwait_idle_with_hints(), advertise in_mwait, with an explanation of > precisely what it means. X86_BUG_MONITOR can be accounted for simply by not > advertising in_mwait. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> > CC: Anthony PERARD <anthony.perard@xxxxxxxxxx> > CC: Michal Orzel <michal.orzel@xxxxxxx> > CC: Julien Grall <julien@xxxxxxx> > CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> > > This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of all > of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait"). > > In Linux, they're both in the same flags field, which adds complexity. In > Xen, __softirq_pending is already unsigned long for everything other than x86, > so adding an arch-neutral "in_mwait" would need wider changes. Why would the flag need to be arch-neutral? A pretty straightforward way to achieve what you want would seem to be to define an x86-only MWAIT_SOFTIRQ, which we'd never raise or open, but which instead we'd advertise to common code as an always-ignore mask (to be OR-ed in by __do_softirq()). However, setting and clearing such a bit would of course require LOCK-ed insns, which clearly is less desirable than the simple MOVB you're using. > --- a/xen/arch/x86/acpi/cpu_idle.c > +++ b/xen/arch/x86/acpi/cpu_idle.c > @@ -439,7 +439,21 @@ __initcall(cpu_idle_key_init); > void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) > { > unsigned int cpu = smp_processor_id(); > - const unsigned int *this_softirq_pending = &softirq_pending(cpu); > + irq_cpustat_t *stat = &irq_stat[cpu]; > + const unsigned int *this_softirq_pending = &stat->__softirq_pending; > + > + /* > + * By setting in_mwait, we promise to other CPUs that we'll notice > changes > + * to __softirq_pending without being sent an IPI. We achieve this by > + * either not going to sleep, or by having hardware notice on our behalf. > + * > + * Some errata exist where MONITOR doesn't work properly, and the > + * workaround is to force the use of an IPI. Cause this to happen by > + * simply not advertising outselves as being in_mwait. Nit: ourselves > + */ > + alternative_io("movb $1, %[in_mwait]", > + "", X86_BUG_MONITOR, > + [in_mwait] "=m" (stat->in_mwait)); > > monitor(this_softirq_pending, 0, 0); > smp_mb(); Unlike alternative(), alternative_io() has no memory clobber. To the compiler the two resulting asm() therefore have no dependency operand-wise[1]. The sole ordering criteria then is that they're both volatile asm()-s. It not being explicitly said (anywhere that I could find) that volatile guarantees such ordering, I wonder if we wouldn't better have an operand-based dependency between the two. Otoh it may well be that we rely on such ordering elsewhere already, in which case having one more instance probably would be okay(ish). [1] It's actually worse than that, I think: monitor() also doesn't specify the location as a (memory) input. > @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned > int ecx) > mwait(eax, ecx); > spec_ctrl_exit_idle(info); > } > + > + alternative_io("movb $0, %[in_mwait]", > + "", X86_BUG_MONITOR, > + [in_mwait] "=m" (stat->in_mwait)); This doesn't strictly need to be an alternative, does it? We can save a memory write in the buggy case, but that likely makes at most a marginal difference. > --- a/xen/arch/x86/include/asm/hardirq.h > +++ b/xen/arch/x86/include/asm/hardirq.h > @@ -5,7 +5,19 @@ > #include <xen/types.h> > > typedef struct { > - unsigned int __softirq_pending; > + /* > + * The layout is important. Any CPU can set bits in __softirq_pending, > + * but in_mwait is a status bit owned by the CPU. softirq_mwait_raw must > + * cover both, and must be in a single cacheline. > + */ > + union { > + struct { > + unsigned int __softirq_pending; > + bool in_mwait; > + }; > + uint64_t softirq_mwait_raw; > + }; To guard against someone changing e.g. __softirq_pending to unsigned long while ignoring the comment, how about adding a BUILD_BUG_ON() checking both parts of the union are the same size (which of course would require naming either the union field within the containing struct or its struct member)? > @@ -9,4 +11,36 @@ > #define HVM_DPCI_SOFTIRQ (NR_COMMON_SOFTIRQS + 4) > #define NR_ARCH_SOFTIRQS 5 > > +/* > + * Ensure softirq @nr is pending on @cpu. Return true if an IPI can be > + * skipped, false if the IPI cannot be skipped. > + * > + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order to > + * set softirq @nr while also observing in_mwait in a race-free way. > + */ > +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int > cpu) > +{ > + uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw; > + uint64_t old, new; > + unsigned int softirq = 1U << nr; > + > + old = ACCESS_ONCE(*ptr); > + > + do { > + if ( old & softirq ) > + /* Softirq already pending, nothing to do. */ > + return true; > + > + new = old | softirq; > + > + } while ( (old = cmpxchg(ptr, old, new)) != new ); Don't you mean } while ( (new = cmpxchg(ptr, old, new)) != old ); here (with new latched ahead of the loop and old set from new inside it), like we have it elsewhere when we avoid the use of yet another variable, e.g. in inc_linear_{entries,uses}()? > + /* > + * We have caused the softirq to become pending. If in_mwait was set, > the > + * target CPU will notice the modification and act on it. > + */ > + return new & (1UL << 32); Hmm, this requires the layout to allow for even less re-arrangement than the comment ahead of the union says: You strictly require in_mwait to follow __softirq_pending immediately, and the (assembly) write to strictly write 1 into the field. Could I talk you into at least return new & (1UL << (offsetof(..., in_mwait) * 8)); ? This way the field being inspected would also be mentioned in the access itself (i.e. become grep-able beyond the mentioning in the comment). And I sincerely dislike hard-coded literal numbers when they (relatively) easily can be expressed another way. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |