[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()

To: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
From: David Hildenbrand <david@xxxxxxxxxx>
Date: Thu, 17 Jul 2025 22:12:37 +0200
Autocrypt: addr=david@xxxxxxxxxx; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmgsLPQFCRvGjuMACgkQTd4Q 9wD/g1o0bxAAqYC7gTyGj5rZwvy1VesF6YoQncH0yI79lvXUYOX+Nngko4v4dTlOQvrd/vhb 02e9FtpA1CxgwdgIPFKIuXvdSyXAp0xXuIuRPQYbgNriQFkaBlHe9mSf8O09J3SCVa/5ezKM OLW/OONSV/Fr2VI1wxAYj3/Rb+U6rpzqIQ3Uh/5Rjmla6pTl7Z9/o1zKlVOX1SxVGSrlXhqt kwdbjdj/csSzoAbUF/duDuhyEl11/xStm/lBMzVuf3ZhV5SSgLAflLBo4l6mR5RolpPv5wad GpYS/hm7HsmEA0PBAPNb5DvZQ7vNaX23FlgylSXyv72UVsObHsu6pT4sfoxvJ5nJxvzGi69U s1uryvlAfS6E+D5ULrV35taTwSpcBAh0/RqRbV0mTc57vvAoXofBDcs3Z30IReFS34QSpjvl Hxbe7itHGuuhEVM1qmq2U72ezOQ7MzADbwCtn+yGeISQqeFn9QMAZVAkXsc9Wp0SW/WQKb76 FkSRalBZcc2vXM0VqhFVzTb6iNqYXqVKyuPKwhBunhTt6XnIfhpRgqveCPNIasSX05VQR6/a OBHZX3seTikp7A1z9iZIsdtJxB88dGkpeMj6qJ5RLzUsPUVPodEcz1B5aTEbYK6428H8MeLq NFPwmknOlDzQNC6RND8Ez7YEhzqvw7263MojcmmPcLelYbfOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCaCwtJQUJG8aPFAAKCRBN3hD3AP+DWlDnD/4k2TW+HyOOOePVm23F5HOhNNd7nNv3 Vq2cLcW1DteHUdxMO0X+zqrKDHI5hgnE/E2QH9jyV8mB8l/ndElobciaJcbl1cM43vVzPIWn 01vW62oxUNtEvzLLxGLPTrnMxWdZgxr7ACCWKUnMGE2E8eca0cT2pnIJoQRz242xqe/nYxBB /BAK+dsxHIfcQzl88G83oaO7vb7s/cWMYRKOg+WIgp0MJ8DO2IU5JmUtyJB+V3YzzM4cMic3 bNn8nHjTWw/9+QQ5vg3TXHZ5XMu9mtfw2La3bHJ6AybL0DvEkdGxk6YHqJVEukciLMWDWqQQ RtbBhqcprgUxipNvdn9KwNpGciM+hNtM9kf9gt0fjv79l/FiSw6KbCPX9b636GzgNy0Ev2UV m00EtcpRXXMlEpbP4V947ufWVK2Mz7RFUfU4+ETDd1scMQDHzrXItryHLZWhopPI4Z+ps0rB CQHfSpl+wG4XbJJu1D8/Ww3FsO42TMFrNr2/cmqwuUZ0a0uxrpkNYrsGjkEu7a+9MheyTzcm vyU2knz5/stkTN2LKz5REqOe24oRnypjpAfaoxRYXs+F8wml519InWlwCra49IUSxD1hXPxO WBe5lqcozu9LpNDH/brVSzHCSb7vjNGvvSVESDuoiHK8gNlf0v+epy5WYd7CGAgODPvDShGN g3eXuA==
Cc: linux-kernel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, nvdimm@xxxxxxxxxxxxxxx, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Matthew Wilcox <willy@xxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Christian Brauner <brauner@xxxxxxxxxx>, "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>, Vlastimil Babka <vbabka@xxxxxxx>, Mike Rapoport <rppt@xxxxxxxxxx>, Suren Baghdasaryan <surenb@xxxxxxxxxx>, Michal Hocko <mhocko@xxxxxxxx>, Zi Yan <ziy@xxxxxxxxxx>, Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>, Nico Pache <npache@xxxxxxxxxx>, Ryan Roberts <ryan.roberts@xxxxxxx>, Dev Jain <dev.jain@xxxxxxx>, Barry Song <baohua@xxxxxxxxxx>, Jann Horn <jannh@xxxxxxxxxx>, Pedro Falcato <pfalcato@xxxxxxx>, Hugh Dickins <hughd@xxxxxxxxxx>, Oscar Salvador <osalvador@xxxxxxx>, Lance Yang <lance.yang@xxxxxxxxx>
Delivery-date: Thu, 17 Jul 2025 20:12:58 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


-/*
- * vm_normal_page -- This function gets the "struct page" associated with a 
pte.
+/**
+ * vm_normal_page_pfn() - Get the "struct page" associated with a PFN in a
+ *                       non-special page table entry.


This is a bit nebulous/confusing, I mean you'll get PTE entries with PTE special
bit that'll have a PFN but just no struct page/folio to look at, or should not
be touched.

So the _pfn() bit doesn't really properly describe what it does.

I wonder if it'd be better to just separate out the special handler, have
that return a boolean indicating special of either form, and then separate
other shared code separately from that?

Let me think about that; I played with various approaches and this wasthe best I was come up with before running in circles.

+ * @vma: The VMA mapping the @pfn.
+ * @addr: The address where the @pfn is mapped.
+ * @pfn: The PFN.
+ * @entry: The page table entry value for error reporting purposes.
   *
   * "Special" mappings do not wish to be associated with a "struct page" 
(either
   * it doesn't exist, or it exists but they don't want to touch it). In this
@@ -603,10 +608,10 @@ static void print_bad_page_map(struct vm_area_struct *vma,
   * (such as GUP) can still identify these mappings and work with the
   * underlying "struct page".
   *
- * There are 2 broad cases. Firstly, an architecture may define a pte_special()
- * pte bit, in which case this function is trivial. Secondly, an architecture
- * may not have a spare pte bit, which requires a more complicated scheme,
- * described below.
+ * There are 2 broad cases. Firstly, an architecture may define a "special"
+ * page table entry bit (e.g., pte_special()), in which case this function is
+ * trivial. Secondly, an architecture may not have a spare page table
+ * entry bit, which requires a more complicated scheme, described below.


Strikes me this bit of the comment should be with vm_normal_page(). As this
implies the 2 broad cases are handled here and this isn't the case.

Well, pragmatism. Splitting up the doc doesn't make sense. Having it atvm_normal_page() doesn't make sense.


I'm sure the educated reader will be able to make sense of it :P

But I'm happy to hear suggestions on how to do it differently :)

   *
   * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a
   * special mapping (even if there are underlying and valid "struct pages").
@@ -639,15 +644,72 @@ static void print_bad_page_map(struct vm_area_struct *vma,
   * don't have to follow the strict linearity rule of PFNMAP mappings in
   * order to support COWable mappings.
   *
+ * This function is not expected to be called for obviously special mappings:
+ * when the page table entry has the "special" bit set.


Hmm this is is a bit weird though, saying "obviously" special, because you're
handling "special" mappings here, but only for architectures that don't specify
the PTE special bit.

So it makes it quite nebulous what constitutes 'obviously' here, really you mean
pte_special().


Yes, I can clarify that.

+ *
+ * Return: Returns the "struct page" if this is a "normal" mapping. Returns
+ *        NULL if this is a "special" mapping.
+ */
+static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma,
+               unsigned long addr, unsigned long pfn, unsigned long long entry)
+{
+       /*
+        * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings
+        * (incl. shared zero folios) are marked accordingly and are handled
+        * by the caller.
+        */
+       if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
+               if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) {
+                       if (vma->vm_flags & VM_MIXEDMAP) {
+                               /* If it has a "struct page", it's "normal". */
+                               if (!pfn_valid(pfn))
+                                       return NULL;
+                       } else {
+                               unsigned long off = (addr - vma->vm_start) >> 
PAGE_SHIFT;
+
+                               /* Only CoW'ed anon folios are "normal". */
+                               if (pfn == vma->vm_pgoff + off)
+                                       return NULL;
+                               if (!is_cow_mapping(vma->vm_flags))
+                                       return NULL;
+                       }
+               }
+
+               if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn))


This handles zero/zero huge page handling for non-pte_special() case
only. I wonder if we even need to bother having these marked special
generally since you can just check the PFN every time anyway.

Well, that makes (a) pte_special() a bit weird -- not set for somespecial pages and (b) requires additional runtime checks for the case weall really care about -- pte_special().


So I don't think we should change that.

[...]


+/**
+ * vm_normal_folio() - Get the "struct folio" associated with a PTE
+ * @vma: The VMA mapping the @pte.
+ * @addr: The address where the @pte is mapped.
+ * @pte: The PTE.
+ *
+ * Get the "struct folio" associated with a PTE. See vm_normal_page_pfn()
+ * for details.
+ *
+ * Return: Returns the "struct folio" if this is a "normal" mapping. Returns
+ *        NULL if this is a "special" mapping.
+ */


Nice to add a comment, but again feels weird to have the whole explanation in
vm_normal_page_pfn() but then to invoke vm_normal_page()..


You want people to do pointer chasing to find what they are looking for? :)

--
Cheers,

David / dhildenb

Follow-Ups:
- Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
  - From: Lorenzo Stoakes

References:
- [PATCH v2 0/9] mm: vm_normal_page*() improvements
  - From: David Hildenbrand
- [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
  - From: David Hildenbrand
- Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
  - From: Lorenzo Stoakes

Prev by Date: Re: [PATCH v2 9/9] mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page()
Next by Date: Re: [PATCH v2 8/9] mm: introduce and use vm_normal_page_pud()
Previous by thread: Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
Next by thread: Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.