[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest

To: Ryan Roberts <ryan.roberts@xxxxxxx>, linux-mm@xxxxxxxxx
From: Kevin Brodsky <kevin.brodsky@xxxxxxx>
Date: Tue, 11 Nov 2025 16:56:44 +0100
Cc: linux-kernel@xxxxxxxxxxxxxxx, Alexander Gordeev <agordeev@xxxxxxxxxxxxx>, Andreas Larsson <andreas@xxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Catalin Marinas <catalin.marinas@xxxxxxx>, Christophe Leroy <christophe.leroy@xxxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, David Hildenbrand <david@xxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, David Woodhouse <dwmw2@xxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Jann Horn <jannh@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>, Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>, Michael Ellerman <mpe@xxxxxxxxxxxxxx>, Michal Hocko <mhocko@xxxxxxxx>, Mike Rapoport <rppt@xxxxxxxxxx>, Nicholas Piggin <npiggin@xxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Suren Baghdasaryan <surenb@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Vlastimil Babka <vbabka@xxxxxxx>, Will Deacon <will@xxxxxxxxxx>, Yeoreum Yun <yeoreum.yun@xxxxxxx>, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, linuxppc-dev@xxxxxxxxxxxxxxxx, sparclinux@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx
Delivery-date: Tue, 11 Nov 2025 15:57:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11/11/2025 10:24, Ryan Roberts wrote:
> [...]
>
>>>> +          state->active = true;
>>>> +          arch_enter_lazy_mmu_mode();
>>>> +  }
>>>>  }
>>>>  
>>>>  static inline void lazy_mmu_mode_disable(void)
>>>>  {
>>>> -  arch_leave_lazy_mmu_mode();
>>>> +  struct lazy_mmu_state *state = &current->lazy_mmu_state;
>>>> +
>>>> +  VM_WARN_ON_ONCE(state->nesting_level == 0);
>>>> +  VM_WARN_ON(!state->active);
>>>> +
>>>> +  if (--state->nesting_level == 0) {
>>>> +          state->active = false;
>>>> +          arch_leave_lazy_mmu_mode();
>>>> +  } else {
>>>> +          /* Exiting a nested section */
>>>> +          arch_flush_lazy_mmu_mode();
>>>> +  }
>>>>  }
>>>>  
>>>>  static inline void lazy_mmu_mode_pause(void)
>>>>  {
>>>> +  struct lazy_mmu_state *state = &current->lazy_mmu_state;
>>>> +
>>>> +  VM_WARN_ON(state->nesting_level == 0 || !state->active);
>>> nit: do you need the first condition? I think when nesting_level==0, we 
>>> expect
>>> to be !active?
>> I suppose this should never happen indeed - I was just being extra
>> defensive.
>>
>> Either way David suggested allowing pause()/resume() to be called
>> outside of any section so the next version will bail out on
>> nesting_level == 0.
> Ignoring my current opinion that we don't need pause/resume at all for now; 
> Are
> you suggesting that pause/resume will be completely independent of
> enable/disable? I think that would be best. So enable/disable increment and
> decrement the nesting_level counter regardless of whether we are paused.
> nesting_level 0 => 1 enables if not paused. nesting_level 1 => 0 disables if 
> not
> paused. pause disables nesting_level >= 1, resume enables if nesting_level >= 
> 1.

This is something else. Currently the rules are:

[A]

// pausing forbidden
enable()
    pause()
    // pausing/enabling forbidden
    resume()
disable()

David suggested allowing:

[B]

pause()
// pausing/enabling forbidden
resume()

Your suggestion is also allowing:

[C]

pause()
    // pausing forbidden
    enable()
    disable()
resume()

> Perhaps we also need nested pause/resume? Then you just end up with 2 
> counters;
> enable_count and pause_count. Sorry if this has already been discussed.

And finally:

[D]

pause()
    pause()
        enable()
        disable()
    resume()
resume()

I don't really mind either way, but I don't see an immediate use for [C]
and [D] - the idea is that the paused section is short and controlled,
not made up of arbitrary calls. A potential downside of allowing [C] and
[D] is that it makes it harder to detect unintended nesting (fewer
VM_WARN assertions). Happy to implement it if this proves useful though.

OTOH the idea behind [B] is that it allows the caller of
pause()/resume() not to care about whether lazy MMU is actually enabled
or not - i.e. the kasan helpers would keep working even if
apply_to_page_range() didn't use lazy MMU any more.

>>>> +
>>>> +  state->active = false;
>>>>    arch_leave_lazy_mmu_mode();
>>>>  }
>>>>  
>>>>  static inline void lazy_mmu_mode_resume(void)
>>>>  {
>>>> +  struct lazy_mmu_state *state = &current->lazy_mmu_state;
>>>> +
>>>> +  VM_WARN_ON(state->nesting_level == 0 || state->active);
>>> Similar argument?
>>>
>>>> +
>>>> +  state->active = true;
>>>>    arch_enter_lazy_mmu_mode();
>>>>  }
>>>>  #else
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index cbb7340c5866..11566d973f42 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1441,6 +1441,10 @@ struct task_struct {
>>>>  
>>>>    struct page_frag                task_frag;
>>>>  
>>>> +#ifdef CONFIG_ARCH_HAS_LAZY_MMU_MODE
>>>> +  struct lazy_mmu_state           lazy_mmu_state;
>>>> +#endif
>>>> +
>>>>  #ifdef CONFIG_TASK_DELAY_ACCT
>>>>    struct task_delay_info          *delays;
>>>>  #endif
>>>> @@ -1724,6 +1728,18 @@ static inline char task_state_to_char(struct 
>>>> task_struct *tsk)
>>>>    return task_index_to_char(task_state_index(tsk));
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_ARCH_HAS_LAZY_MMU_MODE
>>>> +static inline bool in_lazy_mmu_mode(void)
>>>> +{
>>>> +  return current->lazy_mmu_state.active;
>>>> +}
>>>> +#else
>>>> +static inline bool in_lazy_mmu_mode(void)
>>>> +{
>>>> +  return false;
>>> Just pointing out that this isn't really a correct implementation:
>>>
>>> lazy_mmu_mode_enable()
>>> ASSERT(in_lazy_mmu_mode()) << triggers for arches without lazy mmu
>>> lazy_mmu_mode_disable()
>>>
>>> Although it probably doesn't matter in practice?
>> I'd say that the expectation is invalid - lazy MMU mode can only be
>> enabled if the architecture supports it. In fact as you pointed out
>> above the API may be called in interrupt context but it will have no
>> effect, so this sequence would always fail in interrupt context.
> Yep, but previously there was no way to query the current state so it didn't
> matter. Now you have a query API so it might matter if/when people come along
> and use it in unexpected ways.

I suppose the best we can do is document it alongside those helpers
(David has already suggested some documentation, see patch 11).

>> Worth nothing that in_lazy_mmu_mode() is only ever called from arch code
>> where lazy MMU is implemented. I added the fallback as a matter of
>> principle, but it isn't actually required.
> Yes, I agree that's the intent. I'm just wondering if it's possible to enforce
> that only arch code uses this. Perhaps add some docs to explain that it's only
> intended for arches that implement lazy_mmu, and don't define it for arches 
> that
> don't, which would catch any generic users?

Yep sounds like the best option - a lot less risk of misuse if it can't
be called from generic code :) The build would still succeed on arch's
that implement it, but the kernel CI should catch such calls sooner or
later.

- Kevin

Follow-Ups:
- Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
  - From: Ryan Roberts

References:
- Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
  - From: Ryan Roberts
- Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
  - From: Kevin Brodsky
- Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
  - From: Ryan Roberts

Prev by Date: Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Next by Date: [PATCH 0/5] configure: reduce number of files created
Previous by thread: Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
Next by thread: Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.