[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 18/18] xen/riscv: introduce metadata table to store P2M type




On 9/23/25 12:41 AM, Jan Beulich wrote:
On 17.09.2025 23:55, Oleksii Kurochko wrote:

+/*
+ * `pte` – PTE entry for which the type `t` will be stored.
+ *
+ * If `t` is `p2m_ext_storage`, both `ctx` and `p2m` must be provided;
+ * otherwise, they may be NULL.
+ */
+static void p2m_set_type(pte_t *pte, const p2m_type_t t,
+                         struct p2m_pte_ctx *ctx,
+                         struct p2m_domain *p2m)
 {
-    int rc = 0;
+    /*
+    * For the root page table (16 KB in size), we need to select the correct
+    * metadata table, since allocations are 4 KB each. In total, there are
+    * 4 tables of 4 KB each.
+    * For none-root page table index of ->pt_page[] will be always 0 as
+    * index won't be higher then 511. ASSERT() below verifies that.
+    */
+    struct page_info **md_pg =
+        &ctx->pt_page[ctx->index / PAGETABLE_ENTRIES].v.md.metadata;
+    pte_t *metadata = NULL;
+
+     /* Be sure that an index correspondent to page level is passed. */
+    ASSERT(ctx->index <= P2M_PAGETABLE_ENTRIES(ctx->level));
Doesn't this need to be < ?
Yeah, it should be <. 


+    if ( !*md_pg && (t >= p2m_first_external) )
+    {
+        /*
+         * Ensure that when `t` is stored outside the PTE bits
+         * (i.e. `t == p2m_ext_storage` or higher),
+         * both `ctx` and `p2m` are provided.
+         */
+        ASSERT(p2m && ctx);
Imo this would want to be checked whenever t > p2m_first_external, no
matter whether a metadata page was already allocated.
I think that ctx should be checked before this if condition, since it is
used to obtain the proper metadata page.
The check for p2m can remain inside the if condition, as it is essentially
only needed for allocating a metadata page.


    

-    if ( t > p2m_first_external )
-        panic("unimplemeted\n");
-    else
+        if ( ctx->level <= P2M_SUPPORTED_LEVEL_MAPPING )
+        {
+            struct domain *d = p2m->domain;
+
+            *md_pg = p2m_alloc_table(p2m);
+            if ( !*md_pg )
+            {
+                printk("%s: can't allocate extra memory for dom%d\n",
+                        __func__, d->domain_id);
+                domain_crash(d);
+            }
+        }
+        else
+            /*
+             * It is not legal to set a type for an entry which shouldn't
+             * be mapped.
+             */
+            ASSERT_UNREACHABLE();
Something not being legal doesn't mean it can't happen. Imo in this case
BUG_ON() (in place of the if() above) would be better.

+    }
+
+    if ( *md_pg )
+        metadata = __map_domain_page(*md_pg);
+
+    if ( t < p2m_first_external )
+    {
         pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
 
-    return rc;
+        if ( metadata )
+            metadata[ctx->index].pte = p2m_invalid;
+    }
+    else
+    {
+        pte->pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
+
+        metadata[ctx->index].pte = t;
Afaict metadata can still be NULL when you get here.
It shouldn't be, because when this line is executed, the metadata page already
exists or was allocated at the start of p2m_set_type().

+    }
+
+    if ( metadata )
+        unmap_domain_page(metadata);
According to the x86 implementation, passing NULL here ought to be fine,
so no if() needed.
With the current one implementation for RISC-V (CONFIG_ARCH_MAP_DOMAIN_PAGE=n so
unmap_domain_page() does nothing), it is fine too.


 }
 
-static p2m_type_t p2m_get_type(const pte_t pte)
+/*
+ * `pte` -> PTE entry that stores the PTE's type.
+ *
+ * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided;
+ * otherwise it could be NULL.
+ */
+static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx)
 {
     p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
 
     if ( type == p2m_ext_storage )
-        panic("unimplemented\n");
+    {
+        pte_t *md = __map_domain_page(ctx->pt_page->v.md.metadata);
Pointer-to-const?

+        type = md[ctx->index].pte;
+        unmap_domain_page(ctx->pt_page->v.md.metadata);
I'm pretty sure you want to pass md here, not the pointer you passed
into __map_domain_page().
Oh, right. It should be `md`.


@@ -381,7 +465,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
     }
 }
 
-static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
+static pte_t p2m_pte_from_mfn(const mfn_t mfn, const p2m_type_t t,
+                              struct p2m_pte_ctx *p2m_pte_ctx,
+                              const bool is_table,
Do you really need both "is_table" and the context pointer? Couldn't
the "is intermediate page table" case be identified by a NULL context
and/or p2m pointer?
Good point. I will drop is_table.


Also why "const" all of the sudden?
Because nothing of that is going to be changed in p2m_pte_from_mfn(). To have
diff clearer, I can revert these changes.


@@ -435,22 +527,47 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
     return pg;
 }
 
+ * Link this metadata page to page table page's list field.
+ */
+static struct page_info *p2m_alloc_table(struct p2m_domain *p2m)
+{
+    struct page_info *page_tbl = p2m_alloc_page(p2m);
+
+    if ( !page_tbl )
+        return NULL;
+
+    clear_and_clean_page(page_tbl, p2m->clean_dcache);
+
+    return page_tbl;
+}
... the function is needed in the first place.
On one hand, it may not seem strictly necessary, but on the
other hand, without it we would need to repeat the pattern of
allocating, clearing, and cleaning a page each time a page table
is allocated. At the moment, I prefer to keep it.
But considering another your comment below ...

+/*
+ * Free page table's page and metadata page linked to page table's page.
+ */
+static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
+{
+    ASSERT(tbl_pg->v.md.metadata);
Why, when you no longer unconditionally alloc that page?
Agree, there is no need for this ASSERT() as "lazy allocation" is used for
metadata.

 static int p2m_create_table(struct p2m_domain *p2m, pte_t *entry)
 {
-    struct page_info *page;
+    struct page_info *page = p2m_alloc_table(p2m);
 
     ASSERT(!pte_is_valid(*entry));
 
-    page = p2m_alloc_page(p2m);
-    if ( page == NULL )
-        return -ENOMEM;
-
-    clear_and_clean_page(page, p2m->clean_dcache);
-
     p2m_write_pte(entry, page_to_p2m_table(page), p2m->clean_dcache);
 
     return 0;
As per above I don't think any change is needed here.
There are some places in the code where it isn’t necessary to immediately
write the address of a newly allocated page table page into a PTE:
- During superpage splitting: a new page is first allocated for the new
  page table, then it is filled, and only afterwards is the PTE updated
  with the new page table address.
- In p2m_set_type(): when a table is allocated for storing metadata
  (although I think p2m_alloc_page() would work fine here as well),
  there is no need to update any PTE correspondingly.

...
So, I think I can agree that p2m_alloc_table() isn’t really needed.
It should be sufficient to move the clear_and_clean_page(page_tbl, p2m->clean_dcache)
call from p2m_alloc_table() into p2m_alloc_page(), and then just use
p2m_alloc_page() everywhere.

Does the last paragraph make sense?

@@ -707,6 +834,22 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
         pte = *entry;
         pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
 
+        if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
+        {
+            struct p2m_pte_ctx p2m_pte_ctx = {
+                .pt_page = tbl_pg,
+                .index = offsets[level],
+            };
Assuming using "level" is correct here (which it looks like it is), ...

+            p2m_type_t old_type = p2m_get_type(pte, &p2m_pte_ctx);
... can't this move ahead of the loop?
Considering that old_type is expected to be the same for all new PTEs, I think
we can move that ahead of the loop. I'll do that.


+            p2m_pte_ctx.pt_page = page;
+            p2m_pte_ctx.index = i;
+            p2m_pte_ctx.level = level;
Whereas - doesn't this need to be "next_level"?
Yes, it should be next_level.


@@ -718,7 +861,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
      */
     if ( next_level != target )
         rv = p2m_split_superpage(p2m, table + offsets[next_level],
-                                 level - 1, target, offsets);
+                                 level - 1, target, offsets, page);
And btw (alredy in the earlier patch introducing this code) - why isn't
it "next_level" here, instead of "level - 1" (if already you have that
variable)?
Missed to update that part. It should next_level used instead of level - 1.


@@ -812,13 +955,21 @@ static int p2m_set_entry(struct p2m_domain *p2m,
     {
         /* We need to split the original page. */
         pte_t split_pte = *entry;
+        struct page_info *tbl_pg = virt_to_page(table);
This isn't valid on a pointer obtained from map_domain_page().
Oh, sure — virt_to_page() and page_to_virt() should be used only for Xen
heap addresses.

By the way, do we have any documentation, comments, or notes describing
what should be allocated and from where?

Since map_domain_page() returns an address from the direct map region,
should we instead use maddr_to_page(virt_to_maddr(table))?

Thanks for review.

~ Oleksii

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.