[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 12/17] xen/riscv: Implement p2m_free_entry() and related helpers




On 7/1/25 4:23 PM, Jan Beulich wrote:
On 10.06.2025 15:05, Oleksii Kurochko wrote:
This patch introduces a working implementation of p2m_free_entry() for RISC-V
based on ARM's implementation of p2m_free_entry(), enabling proper cleanup
of page table entries in the P2M (physical-to-machine) mapping.

Only few things are changed:
- Use p2m_force_flush_sync() instead of p2m_tlb_flush_sync() as latter
  isn't implemented on RISC-V.
- Introduce and use p2m_type_radix_get() to get a type of p2m entry as
  RISC-V's PTE doesn't have enough space to store all necessary types so
  a type is stored in a radix tree.

Key additions include:
- p2m_free_entry(): Recursively frees page table entries at all levels. It
  handles both regular and superpage mappings and ensures that TLB entries
  are flushed before freeing intermediate tables.
- p2m_put_page() and helpers:
  - p2m_put_4k_page(): Clears GFN from xenheap pages if applicable.
  - p2m_put_2m_superpage(): Releases foreign page references in a 2MB
    superpage.
  - p2m_type_radix_get(): Extracts the stored p2m_type from the radix tree
    using the PTE.
- p2m_free_page(): Returns a page either to the domain's freelist or to
  the domheap, depending on whether the domain is hardware-backed.
What is "hardware-backed"?
It means basically hardware domain, i.e. DOM0.
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -345,11 +345,33 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
     return __map_domain_page(p2m->root + root_table_indx);
 }
 
+static p2m_type_t p2m_type_radix_get(struct p2m_domain *p2m, pte_t pte)
Does it matter to callers that ...

+{
+    void *ptr;
+    gfn_t gfn = mfn_to_gfn(p2m->domain, mfn_from_pte(pte));
+
+    ptr = radix_tree_lookup(&p2m->p2m_type, gfn_x(gfn));
+
+    if ( !ptr )
+        return p2m_invalid;
+
+    return radix_tree_ptr_to_int(ptr);
+}
... this is a radix tree lookup? IOW does "radix" need to be part of the
function name? Also "get" may want to move forward in the name, to better
match the naming of other functions.
Agree, it doesn't really matter, so I will rename it.


      
+/*
+ * In the case of the P2M, the valid bit is used for other purpose. Use
+ * the type to check whether an entry is valid.
+ */
 static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte)
 {
-    panic("%s: isn't implemented for now\n", __func__);
+    return p2m_type_radix_get(p2m, pte) != p2m_invalid;
+}
No checking of the valid bit?
As mentioned in the comment, only the P2M type should be checked, since the
valid bit is used for other purposes we discussed earlier, for example, to
track whether pages were accessed by a guest domain, or to support certain
table invalidation optimizations (1) and (2).
So, in this case, we only need to consider whether the entry is invalid
from the P2M perspective.

(1) https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1245
(2) https://github.com/xen-project/xen/blob/19772b67/xen/arch/arm/mmu/p2m.c#L1386


      
@@ -404,11 +426,127 @@ static int p2m_next_level(struct p2m_domain *p2m, bool alloc_tbl,
     return GUEST_TABLE_MAP_NONE;
 }
 
+static void p2m_put_foreign_page(struct page_info *pg)
+{
+    /*
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    put_page(pg);
Is the comment really true? The page allocator will flush the normal
TLBs, but not the stage-2 ones. Yet those are what you care about here,
aiui.
In alloc_heap_pages():
 ...
     if ( need_tlbflush )
        filtered_flush_tlb_mask(tlbflush_timestamp);
 ...
 
filtered_flush_tlb_mask() calls arch_flush_tlb_mask().

and arch_flush_tlb_mask(), at least, on Arm (I haven't checked x86) is
implented as:
  void arch_flush_tlb_mask(const cpumask_t *mask)  
  {
      /* No need to IPI other processors on ARM, the processor takes care of it. */
      flush_all_guests_tlb();
  }

So it flushes stage-2 TLB.

+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_4k_page(mfn_t mfn, p2m_type_t type)
+{
+    /* TODO: Handle other p2m types */
+
+    /* Detect the xenheap page and mark the stored GFN as invalid. */
+    if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
+        page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
Is this a valid thing to do? How do you make sure the respective uses
(in gnttab's shared and status page arrays) are / were also removed?
As grant table frame GFN is stored directly in struct page_info instead
of keeping it in standalone status/shared arrays, thereby there is no need
for status/shared arrays.


+}
+
+/* Put any references on the superpage referenced by mfn. */
+static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t type)
+{
+    struct page_info *pg;
+    unsigned int i;
+
+    ASSERT(mfn_valid(mfn));
+
+    pg = mfn_to_page(mfn);
+
+    for ( i = 0; i < XEN_PT_ENTRIES; i++, pg++ )
+        p2m_put_foreign_page(pg);
+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(struct p2m_domain *p2m, const pte_t pte,
+                         unsigned int level)
+{
+    mfn_t mfn = pte_get_mfn(pte);
+    p2m_type_t p2m_type = p2m_type_radix_get(p2m, pte);
This gives you the type of the 1st page. What guarantees that all other pages
in a superpage are of the exact same type?
Doesn't superpage mean that all the 4KB pages within that superpage have the
same type and contiguous in memory?


+    ASSERT(p2me_is_valid(p2m, pte));
+
+    /*
+     * TODO: Currently we don't handle level 2 super-page, Xen is not
+     * preemptible and therefore some work is needed to handle such
+     * superpages, for which at some point Xen might end up freeing memory
+     * and therefore for such a big mapping it could end up in a very long
+     * operation.
+     */
This is pretty unsatisfactory. Imo, if you don't deal with that right away,
you're setting yourself up for a significant re-write.
ARM leaves with that for a long time and it seems like it isn't a big issue for it.
And considering that frametable supports only 4Kb page granularity such big mappings
could lead to long operations during memory freeing.
And 1gb mapping isn't used for 


+    if ( level == 1 )
+        return p2m_put_2m_superpage(mfn, p2m_type);
+    else if ( level == 0 )
+        return p2m_put_4k_page(mfn, p2m_type);
Use switch() right away?
It could be, I think that no big difference at the moment, at least.
But I am okay to rework it.


+}
+
+static void p2m_free_page(struct domain *d, struct page_info *pg)
+{
+    if ( is_hardware_domain(d) )
+        free_domheap_page(pg);
Why's the hardware domain different here? It should have a pool just like
all other domains have.
Hardware domain (dom0) should be no limit in the number of pages that can
be allocated, so allocate p2m pages for hardware domain is done from heap.

An idea of p2m pool is to provide a way how to put clear limit and amount
to the p2m allocation.


+    else
+    {
+        spin_lock(&d->arch.paging.lock);
+        page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
+        spin_unlock(&d->arch.paging.lock);
+    }
+}
+
 /* Free pte sub-tree behind an entry */
 static void p2m_free_entry(struct p2m_domain *p2m,
                            pte_t entry, unsigned int level)
 {
-    panic("%s: hasn't been implemented yet\n", __func__);
+    unsigned int i;
+    pte_t *table;
+    mfn_t mfn;
+    struct page_info *pg;
+
+    /* Nothing to do if the entry is invalid. */
+    if ( !p2me_is_valid(p2m, entry) )
+        return;
Does this actually apply to intermediate page tables (which you handle
later in the function), when that's (only) a P2M type check?
Yes, any PTE should have V bit set to 1, so from P2M perspective it also
should be, at least, not equal to p2m_invalid.

~ Oleksii

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.