[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 11/17] xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
On 7/7/25 2:53 PM, Jan Beulich wrote:
On 07.07.2025 13:46, Oleksii Kurochko wrote:On 7/7/25 9:20 AM, Jan Beulich wrote:On 04.07.2025 17:01, Oleksii Kurochko wrote:On 7/1/25 3:49 PM, Jan Beulich wrote:On 10.06.2025 15:05, Oleksii Kurochko wrote:This patch introduces p2m_set_entry() and its core helper __p2m_set_entry() for RISC-V, based loosely on the Arm implementation, with several RISC-V-specific modifications. Key differences include: - TLB Flushing: RISC-V allows caching of invalid PTEs and does not require break-before-make (BBM). As a result, the flushing logic is simplified. TLB invalidation can be deferred until p2m_write_unlock() is called. Consequently, the p2m->need_flush flag is always considered true and is removed. - Page Table Traversal: The order of walking the page tables differs from Arm, and this implementation reflects that reversed traversal. - Macro Adjustments: The macros P2M_ROOT_LEVEL, P2M_ROOT_ORDER, and P2M_ROOT_PAGES are updated to align with the new RISC-V implementation. The main functionality is in __p2m_set_entry(), which handles mappings aligned to page table block entries (e.g., 1GB, 2MB, or 4KB with 4KB granularity). p2m_set_entry() breaks a region down into block-aligned mappings and calls __p2m_set_entry() accordingly. Stub implementations (to be completed later) include: - p2m_free_entry()What would a function of this name do?Recursively visiting all leaf PTE's for sub-tree behind an entry, then calls put_page() (which will free if there is no any reference to this page), freeing intermediate page table (after all entries were freed) by removing it from d->arch.paging.freelist, and removes correspondent page of intermediate page table from p2m->pages list.You can clear entries, but you can't free them, can you?Is is a question regarding terminology?Yes. If one sees a call to a function, it should be possible to at least roughly know what it does without needing to go look at the implementation.I can't free entry itself, but a page table or a page (if it is a leaf entry) on which it points could free.Then e.g. pte_free_subtree() or some such?It sounds fine to me. I'll use suggested name. Just want to notice that other arches also have the same function for the same purpose with the same name.As to x86, it's not general P2M code which uses this odd (for the purpose) name, but only p2m-pt.c.Does it make sense then to change a name for all arches?I think so.+{ + panic("%s: isn't implemented for now\n", __func__); + + return false; +}For this function in particular, though: Besides the "p2me" in the name being somewhat odd (supposedly page table entries here are simply pte_t), how is this going to be different from pte_is_valid()?pte_is_valid() is checking a real bit of PTE, but p2me_is_valid() is checking what is a type stored in the radix tree (p2m->p2m_types): /* * In the case of the P2M, the valid bit is used for other purpose. Use * the type to check whether an entry is valid. */ static inline bool p2me_is_valid(struct p2m_domain *p2m, pte_t pte) { return p2m_type_radix_get(p2m, pte) != p2m_invalid; } It is done to track which page was modified by a guest.But then (again) the name doesn't convey what the function does.Then probably p2me_type_is_valid(struct p2m_domain *p2m, pte_t pte) would better.For P2M type checks please don't invent new naming, but use what both x86 and Arm are already using. Note how we already have p2m_is_valid() in that set. Just that it's not doing what you want here. Hm, why not doing what I want? p2m_is_valid() verifies if P2M entry is valid. And in here it is checked if P2M pte is valid from P2M point of view by checking the type in radix tree and/or in reserved PTEs bits (just to remind we have only 2 free bits for type). Plus can't a guest also arrange for an entry's type to move to p2m_invalid? That's then still an entry that was modified by the guest.I am not really sure that I fully understand the question. Do you ask if a guest can do something which will lead to a call of p2m_set_entry() with p2m_invalid argument?That I'm not asking, but rather stating. I.e. I expect such is possible.If yes, then it seems like it will be done only in case of p2m_remove_mapping() what will mean that alongside with p2m_invalid INVALID_MFN will be also passed, what means this entry isn't expected to be used anymore.Right. But such an entry would still have been "modified" by the guest. Yes, but nothing then is needed to do with it. For example, if it is already invalid there is not any sense to flush page to RAM (as in this case PTE's bit will be checked), something like Arm does: https://elixir.bootlin.com/xen/v4.20.0/source/xen/arch/arm/p2m.c#L375 + /* + * Don't take into account the MFN when removing mapping (i.e + * MFN_INVALID) to calculate the correct target order. + * + * XXX: Support superpage mappings if nr is not aligned to a + * superpage size. + */Does this really need leaving as a to-do?I think so, yes. It won’t break the current workflow if|nr| isn’t aligned, a smaller order will simply be chosen.Well, my question was more like "Isn't it simple enough to cover the case right away?"+ mask = !mfn_eq(smfn, INVALID_MFN) ? mfn_x(smfn) : 0; + mask |= gfn_x(sgfn) | nr; + + for ( ; i != 0; i-- ) + { + if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) ) + { + order = XEN_PT_LEVEL_ORDER(i); + break;Nit: Style.+ } + } + + rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a); + if ( rc ) + break; + + sgfn = gfn_add(sgfn, (1 << order)); + if ( !mfn_eq(smfn, INVALID_MFN) ) + smfn = mfn_add(smfn, (1 << order)); + + nr -= (1 << order);Throughout maybe better be safe right away and use 1UL?+ } + + return rc; }How's the caller going to know how much of the range was successfully mapped?There is no such option. Do other arches do that? I mean returns somehow the number of successfully mapped (sgfn,smfn).On x86 we had to introduce some not very nice code to cover for the absence of proper handling there. For a new port I think it wants at least seriously considering not to repeat such a potentially unhelpful pattern.That part may need undoing (if not here, then in the caller), or a caller may want to retry.So the caller in the case if rc != 0, can just undoing the full range (by using the same sgfn, nr, smfn).Can it? How would it know what the original state was?You're right — blindly unmapping the range assumes that no entries were valid beforehand and I missed that it could be that something valid was mapped before p2m_set_entry(sgfn,...,smfn) was called. But then I am not really understand why it won't be an issue if will know how many GFNs were successfully mapped.The caller may know what that range's state was. But what I really wanted to convey is: Updating multiple entries in one go is complicated in some of the corner cases. You will want to think this through now, in order to avoid the need to re-write everything later again. I can add one more argument to return the number of successfully mapped GFNs. Fortunately, that's very easy to do. The problem for me is that I don’t really understand what the caller is supposed
to do with that information. The only use case I can think of is that the caller
might try to map the remaining GFNs again. But that doesn’t seem very useful,
if
The same applies to rolling back the state. It wouldn’t be difficult to add a local
array to track all modified PTEs and then use it to revert the state if needed.
But again, what would the caller do after the rollback? At this point, it still seems
like the best option is simply to
Basically, I don’t see or understand the cases where knowing how many GFNs were
successfully mapped, or whether a rollback was performed, would really help — because
in most cases, I don’t have a better option than just calling
For example, if I call
Are there any realistic use cases where knowing the number of mapped GFNs or having rollback support would actually allow us to avoid a panic? Even more so, how would that information be used in the current call chain? We have the following chain: Thanks in advance for clarifications.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |