[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/6] x86/idle: Implement a new MWAIT IPI-elision algorithm


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 3 Jul 2025 16:07:12 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 03 Jul 2025 14:07:33 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.07.2025 13:59, Andrew Cooper wrote:
> On 03/07/2025 10:01 am, Jan Beulich wrote:
>> On 02.07.2025 16:41, Andrew Cooper wrote:
>>> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned 
>>> int ecx)
>>>          mwait(eax, ecx);
>>>          spec_ctrl_exit_idle(info);
>>>      }
>>> +
>>> +    alternative_io("movb $0, %[in_mwait]",
>>> +                   "", X86_BUG_MONITOR,
>>> +                   [in_mwait] "=m" (stat->in_mwait));
>> This doesn't strictly need to be an alternative, does it? We can save a
>> memory write in the buggy case, but that likely makes at most a marginal
>> difference.
> 
> It doesn't *need* to be an alternative.  It could be:
> 
> if ( !cpu_bug_monitor )
>     ACCESS_ONCE(stat->in_mwait) = true;
> 
> but getting rid of the branch is an advantage too.

That's the other alternative blob. What I mean that here it could simply
be

    ACCESS_ONCE(stat->in_mwait) = false;

without any conditional.

>>> --- a/xen/arch/x86/include/asm/hardirq.h
>>> +++ b/xen/arch/x86/include/asm/hardirq.h
>>> @@ -5,7 +5,19 @@
>>>  #include <xen/types.h>
>>>  
>>>  typedef struct {
>>> -    unsigned int __softirq_pending;
>>> +    /*
>>> +     * The layout is important.  Any CPU can set bits in __softirq_pending,
>>> +     * but in_mwait is a status bit owned by the CPU.  softirq_mwait_raw 
>>> must
>>> +     * cover both, and must be in a single cacheline.
>>> +     */
>>> +    union {
>>> +        struct {
>>> +            unsigned int __softirq_pending;
>>> +            bool in_mwait;
>>> +        };
>>> +        uint64_t softirq_mwait_raw;
>>> +    };
>> To guard against someone changing e.g. __softirq_pending to unsigned long
>> while ignoring the comment, how about adding a BUILD_BUG_ON() checking both
>> parts of the union are the same size (which of course would require naming
>> either the union field within the containing struct or its struct member)?
> 
> That turns out to be very hard because of the definition of
> softirq_pending() being common.  More below.

Hmm, yes, I see. Then maybe

    BUILD_BUG_ON(offsetof(irq_cpustat_t, in_mwait) +
                   sizeof(irq_stat[0].in_mwait) >
                 offsetof(irq_cpustat_t, softirq_mwait_raw) +
                   sizeof(irq_stat[0].softirq_mwait_raw));

(or something substantially similar using e.g. typeof())?

>>> @@ -9,4 +11,36 @@
>>>  #define HVM_DPCI_SOFTIRQ       (NR_COMMON_SOFTIRQS + 4)
>>>  #define NR_ARCH_SOFTIRQS       5
>>>  
>>> +/*
>>> + * Ensure softirq @nr is pending on @cpu.  Return true if an IPI can be
>>> + * skipped, false if the IPI cannot be skipped.
>>> + *
>>> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order 
>>> to
>>> + * set softirq @nr while also observing in_mwait in a race-free way.
>>> + */
>>> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int 
>>> cpu)
>>> +{
>>> +    uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw;
>>> +    uint64_t old, new;
>>> +    unsigned int softirq = 1U << nr;
>>> +
>>> +    old = ACCESS_ONCE(*ptr);
>>> +
>>> +    do {
>>> +        if ( old & softirq )
>>> +            /* Softirq already pending, nothing to do. */
>>> +            return true;
>>> +
>>> +        new = old | softirq;
>>> +
>>> +    } while ( (old = cmpxchg(ptr, old, new)) != new );
>> Don't you mean
>>
>>     } while ( (new = cmpxchg(ptr, old, new)) != old );
>>
>> here
> 
> No. I'm pretty sure I don't.
> 
>> (with new latched ahead of the loop and old set from new inside it),
>> like we have it elsewhere when we avoid the use of yet another variable,
>> e.g. in inc_linear_{entries,uses}()?
> 
> Eww, no.  Having new and old backwards like that is borderline
> obfucation, and is unnecessary cognitive complexity for the reader.

Why backwards? You want to compare the (original) value read against the
expected old value. The way you wrote it you'll do at least one extra
loop iteration, as you wait for the fetched (original) value to equal
"new".

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.