[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 2/3] xen/smp: Rewrite on_selected_cpus() to be lockless


  • To: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 9 Apr 2026 13:46:59 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 09 Apr 2026 11:47:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.04.2026 18:35, Ross Lagerwall wrote:
> --- a/xen/common/smp.c
> +++ b/xen/common/smp.c
> @@ -24,13 +24,15 @@
>  /*
>   * Structure and data for smp_call_function()/on_selected_cpus().
>   */
> -static DEFINE_SPINLOCK(call_lock);
> -static struct call_data_struct {
> +struct call_data_struct {
>      void (*func) (void *info);
>      void *info;
>      int wait;
> -    cpumask_t selected;
> -} call_data;
> +    cpumask_t selected __cacheline_aligned;

The adding of the alignment attribute isn't strictly required here, is it?
Can it, with its own justification, be split out?

However, wrt what I said in the first reply, putting a full cpumask_t in
per-CPU data is even a little worse than having one as a global. Imo this
also wants to become cpumask_var_t. Then the attribute wouldn't be quite
applicable anymore anyway.

> @@ -50,55 +52,84 @@ void on_selected_cpus(
>      void *info,
>      int wait)
>  {
> +    struct call_data_struct *data;
> +    unsigned int cpu = smp_processor_id();
> +
>      ASSERT(local_irq_is_enabled());
>      ASSERT(cpumask_subset(selected, &cpu_online_map));
>  
> -    spin_lock(&call_lock);
> +    if ( cpumask_empty(selected) )
> +        return;
> +
> +    data = &this_cpu(call_data);
>  
> -    cpumask_copy(&call_data.selected, selected);
> +    if ( !data->wait )
> +    {
> +        /* Wait for any previous async call to complete */
> +        while ( !cpumask_empty(&data->selected) )
> +            cpu_relax();
> +
> +        cpumask_clear_cpu(cpu, &tasks);

In the description you say "Track which CPUs are currently running
on_selected_cpus()", yet as per this the bit can remain set after the
function was left. Which isn't just an issue of describing things
correctly; there's also a performance concern: The IPI handler(s) will
need to carry out more work than necessary. That's not a lot of work (only
the subsequent cpumask_test_cpu() there), but also not nothing.

I also think another barrier is needed above here: We may only clear the
bit in tasks when the empty ->selected is globally visible. More generally,
since the bit in tasks is what everything derives from, barriers are
apparently needed around all of its updating / accessing. Which also meant
that ...

> +    }
>  
> -    if ( cpumask_empty(&call_data.selected) )
> -        goto out;
> +    data->func = func;
> +    data->info = info;
> +    data->wait = wait;
>  
> -    call_data.func = func;
> -    call_data.info = info;
> -    call_data.wait = wait;
> +    smp_wmb();

... besides (as already indicated) this barrier needing to move ...

> -    smp_send_call_function_mask(&call_data.selected);
> +    cpumask_copy(&data->selected, selected);

... here, I think that another one is going to be needed ...

> -    while ( !cpumask_empty(&call_data.selected) )
> -        cpu_relax();
> +    cpumask_set_cpu(cpu, &tasks);

... here, such that ...

> -out:
> -    spin_unlock(&call_lock);
> +    smp_send_call_function_mask(&data->selected);

... upon receipt of the IPI the target sees the up-to-date value.

> +    if ( wait )
> +    {
> +        while ( !cpumask_empty(&data->selected) )
> +            cpu_relax();
> +
> +        cpumask_clear_cpu(cpu, &tasks);
> +    }
>  }
>  
>  void smp_call_function_interrupt(void)
>  {
> -    void (*func)(void *info) = call_data.func;
> -    void *info = call_data.info;
>      unsigned int cpu = smp_processor_id();
> -
> -    if ( !cpumask_test_cpu(cpu, &call_data.selected) )
> -        return;
> +    unsigned int i;
> +    struct call_data_struct *data;
> +    void (*func)(void *info);
> +    void *info;
>  
>      irq_enter();
>  
> -    if ( unlikely(!func) )
> -    {
> -        cpumask_clear_cpu(cpu, &call_data.selected);
> -    }
> -    else if ( call_data.wait )
> -    {
> -        (*func)(info);
> -        smp_mb();
> -        cpumask_clear_cpu(cpu, &call_data.selected);
> -    }
> -    else
> +    for_each_cpu ( i, &tasks )
>      {
> -        smp_mb();
> -        cpumask_clear_cpu(cpu, &call_data.selected);
> -        (*func)(info);
> +        data = &per_cpu(call_data, i);
> +
> +        if ( !cpumask_test_cpu(cpu, &data->selected) )
> +            continue;
> +
> +        smp_rmb();
> +        func = data->func;
> +        info = data->info;
> +
> +        if ( unlikely(!func) )
> +        {
> +            cpumask_clear_cpu(cpu, &data->selected);
> +        }
> +        else if ( data->wait )
> +        {
> +            (*func)(info);
> +            smp_mb();
> +            cpumask_clear_cpu(cpu, &data->selected);
> +        }
> +        else
> +        {
> +            smp_mb();
> +            cpumask_clear_cpu(cpu, &data->selected);
> +            (*func)(info);

I understand you only re-indent this code, but I'm struggling with the
purpose of the barrier here. With the smp_rmb() above there are no reads
to isolate (data->func and data->info can't change while the bit in
data->selected is still set). And there are no earlier writes at all,
unless anything done outside of the interrupt handler would matter.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.