Xen project Mailing List

Re: [PATCH v2 15/17] xen/riscv: Implement superpage splitting for p2m mappings

To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Date: Wed, 2 Jul 2025 11:25:14 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 02 Jul 2025 09:25:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.06.2025 15:05, Oleksii Kurochko wrote: > Add support for down large memory mappings ("superpages") in the RISC-V > p2m mapping so that smaller, more precise mappings ("finer-grained entries") > can be inserted into lower levels of the page table hierarchy. > > To implement that the following is done: > - Introduce p2m_split_superpage(): Recursively shatters a superpage into > smaller page table entries down to the target level, preserving original > permissions and attributes. > - __p2m_set_entry() updated to invoke superpage splitting when inserting > entries at lower levels within a superpage-mapped region. > > This implementation is based on the ARM code, with modifications to the part > that follows the BBM (break-before-make) approach. Unlike ARM, RISC-V does > not require BBM, so there is no need to invalidate the PTE and flush the > TLB before updating it with the newly created, split page table. But some flushing is going to be necessary. As long as you only ever do global flushes, the one after the individual PTE modification (within the split table) will do (if BBM isn't required, see below), but once you move to more fine-grained flushing, that's not going to be enough anymore. Not sure it's a good idea to leave such a pitfall. As to (no need for) BBM: I couldn't find anything to that effect in the privileged spec. Can you provide some pointer? What I found instead is e.g. this sentence: "To ensure that implicit reads observe writes to the same memory locations, an SFENCE.VMA instruction must be executed after the writes to flush the relevant cached translations." And this: "Accessing the same location using different cacheability attributes may cause loss of coherence." (This may not only occur when the same physical address is mapped twice at different VAs, but also after the shattering of a superpage when the new entry differs in cacheability.) > Additionally, the page table walk logic has been adjusted, as ARM uses the > opposite walk order compared to RISC-V. I think you used some similar wording already in an earlier patch. I find this confusing: Walk order is, aiui, the same. It's merely the numbering of levels that is the opposite way round, isn't it? > Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx> > --- > Changes in V2: > - New patch. It was a part of a big patch "xen/riscv: implement p2m mapping > functionality" which was splitted to smaller. > - Update the commit above the cycle which creates new page table as > RISC-V travserse page tables in an opposite to ARM order. > - RISC-V doesn't require BBM so there is no needed for invalidating > and TLB flushing before updating PTE. > --- > xen/arch/riscv/p2m.c | 102 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 101 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c > index 87dd636b80..79c4473f1f 100644 > --- a/xen/arch/riscv/p2m.c > +++ b/xen/arch/riscv/p2m.c > @@ -743,6 +743,77 @@ static void p2m_free_entry(struct p2m_domain *p2m, > p2m_free_page(p2m->domain, pg); > } > > +static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry, > + unsigned int level, unsigned int target, > + const unsigned int *offsets) > +{ > + struct page_info *page; > + unsigned int i; > + pte_t pte, *table; > + bool rv = true; > + > + /* Convenience aliases */ > + mfn_t mfn = pte_get_mfn(*entry); > + unsigned int next_level = level - 1; > + unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level); > + > + /* > + * This should only be called with target != level and the entry is > + * a superpage. > + */ > + ASSERT(level > target); > + ASSERT(p2me_is_superpage(p2m, *entry, level)); > + > + page = p2m_alloc_page(p2m->domain); > + if ( !page ) > + return false; > + > + page_list_add(page, &p2m->pages); Is there a reason this list maintenance isn't done in p2m_alloc_page()? > + table = __map_domain_page(page); > + > + /* > + * We are either splitting a second level 1G page into 512 first level > + * 2M pages, or a first level 2M page into 512 zero level 4K pages. > + */ > + for ( i = 0; i < XEN_PT_ENTRIES; i++ ) > + { > + pte_t *new_entry = table + i; > + > + /* > + * Use the content of the superpage entry and override > + * the necessary fields. So the correct permission are kept. > + */ > + pte = *entry; > + pte_set_mfn(&pte, mfn_add(mfn, i << level_order)); While okay as long as you only permit superpages up to 1G, this is another trap for someone to fall into: Imo i would better be unsigned long right away, considering that RISC-V permits large pages at all levels. > + write_pte(new_entry, pte); > + } > + > + /* > + * Shatter superpage in the page to the level we want to make the > + * changes. > + * This is done outside the loop to avoid checking the offset to > + * know whether the entry should be shattered for every entry. > + */ > + if ( next_level != target ) > + rv = p2m_split_superpage(p2m, table + offsets[next_level], > + level - 1, target, offsets); I don't understand the comment: Under what conditions would every entry need (further) shattering? And where's that happening? Or is this merely a word ordering issue in the sentence, and "for every entry" wants moving ahead? (In that case I'm unconvinced this is in need of commenting upon.) > + /* TODO: why it is necessary to have clean here? Not somewhere in the > caller */ > + if ( p2m->clean_pte ) > + clean_dcache_va_range(table, PAGE_SIZE); > + > + unmap_domain_page(table); Again likely not something that wants taking care of right away, but there again is an inefficiency here: The caller almost certainly wants to map the same page again, to update the one entry that caused the request to shatter the page. > + /* > + * Even if we failed, we should install the newly allocated PTE > + * entry. The caller will be in charge to free the sub-tree. > + */ > + p2m_write_pte(entry, page_to_p2m_table(p2m, page), p2m->clean_pte); Why would it be wrong to free the page right here, vacating the entry at the same time (or leaving just that to the caller)? (IOW - if this is an implementation decision of yours, I think the word "should" would want dropping.) After all, the caller invoking p2m_free_entry() on the thus split PTE is less efficient (needs to iterate over all entries) than on the original one (where it's just a single superpage). > @@ -806,7 +877,36 @@ static int __p2m_set_entry(struct p2m_domain *p2m, > */ > if ( level > target ) This condition is likely too strong, unless you actually mean to also split a superpage if it really wouldn't need splitting (new entry written still fitting with the superpage mapping, i.e. suitable MFN and same attributes). > { > - panic("Shattering isn't implemented\n"); > + /* We need to split the original page. */ > + pte_t split_pte = *entry; > + > + ASSERT(p2me_is_superpage(p2m, *entry, level)); > + > + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) > + { > + /* Free the allocated sub-tree */ > + p2m_free_entry(p2m, split_pte, level); > + > + rc = -ENOMEM; > + goto out; > + } > + > + p2m_write_pte(entry, split_pte, p2m->clean_pte); > + > + /* Then move to the level we want to make real changes */ > + for ( ; level < target; level++ ) Don't you mean to move downwards here? At which point I wonder: Did you test this code? > + { > + rc = p2m_next_level(p2m, true, level, &table, offsets[level]); > + > + /* > + * The entry should be found and either be a table > + * or a superpage if level 0 is not targeted > + */ > + ASSERT(rc == GUEST_TABLE_NORMAL || > + (rc == GUEST_TABLE_SUPER_PAGE && target > 0)); > + } This, too, is inefficient (but likely good enough as a starting point): You walk tables twice - first when splitting, and then again when finding the target level. Considering the enclosing if(), this also again is a do/while() candidate. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.