[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 6/7] xen/riscv: page table handling
On 28.08.2024 18:11, oleksii.kurochko@xxxxxxxxx wrote: > On Tue, 2024-08-27 at 17:00 +0200, Jan Beulich wrote: >> On 21.08.2024 18:06, Oleksii Kurochko wrote: >>> Implement map_pages_to_xen() which requires several >>> functions to manage page tables and entries: >>> - pt_update() >>> - pt_mapping_level() >>> - pt_update_entry() >>> - pt_next_level() >>> - pt_check_entry() >>> >>> To support these operations, add functions for creating, >>> mapping, and unmapping Xen tables: >>> - create_table() >>> - map_table() >>> - unmap_table() >>> >>> Introduce internal macros starting with PTE_* for convenience. >>> These macros closely resemble PTE bits, with the exception of >>> PTE_SMALL, which indicates that 4KB is needed. >> >> What macros are you talking about here? Is this partially stale, as >> only PTE_SMALL and PTE_POPULATE (and a couple of masks) are being >> added? > I am speaking about macros connected to masks: > #define PTE_R_MASK(x) ((x) & PTE_READABLE) > #define PTE_W_MASK(x) ((x) & PTE_WRITABLE) > #define PTE_X_MASK(x) ((x) & PTE_EXECUTABLE) > > #define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE | > PTE_EXECUTABLE)) Some of which is did question further down in my reply. But what's worse - by saying "closely resemble PTE bits, with the exception of PTE_SMALL" you pretty clearly _do not_ refer to the macros above, but to PTE_VALID etc. >>> @@ -68,6 +111,20 @@ static inline bool pte_is_valid(pte_t p) >>> return p.pte & PTE_VALID; >>> } >>> >>> +inline bool pte_is_table(const pte_t p) >>> +{ >>> + return ((p.pte & (PTE_VALID | >>> + PTE_READABLE | >>> + PTE_WRITABLE | >>> + PTE_EXECUTABLE)) == PTE_VALID); >>> +} >> >> In how far is the READABLE check valid here? You (imo correctly) ... Oh, I wrongly picked on READABLE when it should have been the WRITABLE bit. >>> +static inline bool pte_is_mapping(const pte_t p) >>> +{ >>> + return (p.pte & PTE_VALID) && >>> + (p.pte & (PTE_WRITABLE | PTE_EXECUTABLE)); >>> +} >> >> ... don't consider this bit here. > pte_is_mapping() seems to me is correct as according to the RISC-V > privileged spec: > 4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x = 1, go to step > 5. Otherwise, this PTE is a pointer to the next level of the page > table. > 5. A leaf PTE has been found. ... Right. And then why do you check all three of r, x, and w, when the doc mentions only r and x? There may be reasons, but such reasons then need clearly stating in a code comment, for people to understand why the code is not following the spec. > and regarding pte_is_table() READABLE check is valid as we have to > check only that pte.r = pte.x = 0. WRITABLE check should be dropped. Or > just use define pte_is_table() as: > inline bool pte_is_table(const pte_t p) > { > return !pte_is_mapping(p); > } You had it like this earlier on, didn't you? That's wrong, because for a PTE to describe another page table level PTE_VALID needs to be set. >>> --- /dev/null >>> +++ b/xen/arch/riscv/pt.c >>> @@ -0,0 +1,420 @@ >>> +#include <xen/bug.h> >>> +#include <xen/domain_page.h> >>> +#include <xen/errno.h> >>> +#include <xen/mm.h> >>> +#include <xen/mm-frame.h> >>> +#include <xen/pmap.h> >>> +#include <xen/spinlock.h> >>> + >>> +#include <asm/flushtlb.h> >>> +#include <asm/page.h> >>> + >>> +static inline const mfn_t get_root_page(void) >>> +{ >>> + paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) << >>> PAGE_SHIFT; >>> + >>> + return maddr_to_mfn(root_maddr); >>> +} >>> + >>> +/* Sanity check of the entry. */ >>> +static bool pt_check_entry(pte_t entry, mfn_t mfn, unsigned int >>> flags) >>> +{ >>> + /* >>> + * See the comment about the possible combination of (mfn, >>> flags) in >>> + * the comment above pt_update(). >>> + */ >>> + >>> + /* Sanity check when modifying an entry. */ >>> + if ( (flags & PTE_VALID) && mfn_eq(mfn, INVALID_MFN) ) >>> + { >>> + /* We don't allow modifying an invalid entry. */ >>> + if ( !pte_is_valid(entry) ) >>> + { >>> + printk("Modifying invalid entry is not allowed.\n"); >> >> Perhaps all of these printk()s should be dprintk()? > It could be dprintk() but at the same time I don't see any issue if it > will be printed once. What guarantees that it wouldn't be logged over and over? It's simply bad practice to accompany all error returns with log messages, even in release builds. Even if right now you're only in the bring-up phase, you still want to have security in mind. If any such log message ended up reachable from a guest-invoked path, an XSA would be needed. >> And not have a full >> stop? > By "full stop," do you mean something like panic() or BUG_ON()? No. "Full stop" is the period at the end of a sentence (which shouldn't normally be there at the end of log messages). >>> + /* >>> + * We don't allow replacing any valid entry. >>> + * >>> + * Note that the function pt_update() relies on this >>> + * assumption and will skip the TLB flush (when Svvptc >>> + * extension will be ratified). The function will need >>> + * to be updated if the check is relaxed. >>> + */ >>> + if ( pte_is_valid(entry) ) >>> + { >>> + if ( pte_is_mapping(entry) ) >>> + printk("Changing MFN for a valid entry is not >>> allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n", >>> + mfn_x(mfn_from_pte(entry)), mfn_x(mfn)); >>> + else >>> + printk("Trying to replace a table with a >>> mapping.\n"); >>> + return false; >>> + } >>> + } >>> + /* Sanity check when removing a mapping. */ >>> + else if ( (flags & (PTE_VALID | PTE_POPULATE)) == 0 ) >> >> The PTE_VALID part of the check is pointless considering the earlier >> if(). I guess you may want to have it for doc purposes ... > Yes, it just helps to read the code and understand "confusing" if's() > above. Well, since you mention "confusing": I for one consider such redundant checks confusing. It raises the question whether this check is wrong or the earlier one is. Therefore, if you want to keep the redundancy, it may help if you extend the comment to mention it's actually redundant (e.g. by saying "for completeness" or some such). >>> +#define XEN_TABLE_MAP_FAILED 0 >>> +#define XEN_TABLE_SUPER_PAGE 1 >>> +#define XEN_TABLE_NORMAL 2 >>> + >>> +/* >>> + * Take the currently mapped table, find the corresponding entry, >>> + * and map the next table, if available. >>> + * >>> + * The alloc_tbl parameters indicates whether intermediate tables >>> should >>> + * be allocated when not present. >>> + * >>> + * Return values: >>> + * XEN_TABLE_MAP_FAILED: Either alloc_only was set and the entry >>> + * was empty, or allocating a new page failed. >>> + * XEN_TABLE_NORMAL: next level or leaf mapped normally >>> + * XEN_TABLE_SUPER_PAGE: The next entry points to a superpage. >>> + */ >>> +static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned >>> int offset) >> >> Having the boolean first is unusual, but well - it's your choice. >> >>> +{ >>> + pte_t *entry; >>> + int ret; >>> + mfn_t mfn; >>> + >>> + entry = *table + offset; >>> + >>> + if ( !pte_is_valid(*entry) ) >>> + { >>> + if ( alloc_tbl ) >>> + return XEN_TABLE_MAP_FAILED; >> >> Is this condition meant to be inverted? > if alloc_tbl = true we shouldn't allocatetable as: > * The intermediate page table shouldn't be allocated when MFN > isn't > * valid and we are not populating page table. > ... > */ Well, no. The variable name really shouldn't be the opposite of what is meant. "alloc_tbl" can only possibly mean "allocate a table if none is there". I can't think of a sensible interpretation in the inverted sense. I'm curious how you mean to interpret that variable name. > bool alloc_tbl = mfn_eq(mfn, INVALID_MFN) && !(flags & > PTE_POPULATE); > > So if mfn = INVALID_MFN and flags.PTE_POPULATE=0 it means that this > table shouldn't be allocated and thereby pt_next_level() should return > XEN_TABLE_MAP_FAILED. > > Or to invert if ( alloc_tbl )it will be needed to invert defintion of > alloc_tbl: > bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE); Yes, as I did comment further down. >>> + if ( level != target ) >>> + { >>> + printk("%s: Shattering superpage is not supported\n", >>> __func__); >>> + rc = -EOPNOTSUPP; >>> + goto out; >>> + } >>> + >>> + entry = table + offsets[level]; >>> + >>> + rc = -EINVAL; >>> + if ( !pt_check_entry(*entry, mfn, flags) ) >>> + goto out; >>> + >>> + /* We are removing the page */ >>> + if ( !(flags & PTE_VALID) ) >>> + memset(&pte, 0x00, sizeof(pte)); >>> + else >>> + { >>> + /* We are inserting a mapping => Create new pte. */ >>> + if ( !mfn_eq(mfn, INVALID_MFN) ) >>> + pte = pte_from_mfn(mfn, PTE_VALID); >>> + else /* We are updating the permission => Copy the current >>> pte. */ >>> + pte = *entry; >>> + >>> + /* update permission according to the flags */ >>> + pte.pte |= PTE_RWX_MASK(flags) | PTE_ACCESSED | PTE_DIRTY; >> >> When updating an entry, don't you also need to clear (some of) the >> flags? > I am not sure why some flags should be cleared. Here we are taking only > necessary for pte flags such as R, W, X or other bits in flags are > ignored. Consider what happens to a PTE with R and X set when a request comes in to change to R/W. You'll end up with R, X, and W all set if you don't first clear the bits that are meant to be changeable in a "modify" operation. >>> +/* Return the level where mapping should be done */ >>> +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned >>> long nr, >>> + unsigned int flags) >>> +{ >>> + unsigned int level = 0; >>> + unsigned long mask; >>> + unsigned int i; >>> + >>> + /* Use blocking mapping unless the caller requests 4K mapping >>> */ >>> + if ( unlikely(flags & PTE_SMALL) ) >>> + return level; >>> + >>> + /* >>> + * Don't take into account the MFN when removing mapping (i.e >>> + * MFN_INVALID) to calculate the correct target order. >>> + * >>> + * `vfn` and `mfn` must be both superpage aligned. >>> + * They are or-ed together and then checked against the size >>> of >>> + * each level. >>> + * >>> + * `left` is not included and checked separately to allow >>> + * superpage mapping even if it is not properly aligned (the >>> + * user may have asked to map 2MB + 4k). >> >> What is this about? There's nothing named "left" here. > It refer to "remaining" pages or "leftover" space after trying to align > a mapping to a superpage boundary. What what is the quoted "left" here? Such a variable appears to exist in the caller, but using the name here is lacking context. >>> +/* >>> + * If `mfn` equals `INVALID_MFN`, it indicates that the following >>> page table >>> + * update operation might be related to either populating the >>> table ( >>> + * PTE_POPULATE will be set additionaly), destroying a mapping, or >>> modifying >>> + * an existing mapping. >> >> And the latter two are distinguished by? PTE_VALID? > inserting -> (PTE_VALID=1 + (mfn=something valid)) > destroying-> ( PTE_VALID=0 ) Which then needs saying in the comment. >>> + * If `mfn` is valid and flags has PTE_VALID bit set then it means >>> that >>> + * inserting will be done. >>> + */ >> >> What about mfn != INVALID_MFN and PTE_VALID clear? > PTE_VALID=0 will be always considered as destroying and no matter what > is mfn value as in this case the removing is done in the way where mfn > isn't used: Right, yet elsewhere you're restrictive as to MFN values valid to use. Not requiring INVALID_MFN here looks inconsistent to me. > memset(&pte, 0x00, sizeof(pte)); Just to mention it: I don't think memset() is a very good way of clearing a PTE, even if right here it's not a live one. >> Also note that "`mfn` is >> valid" isn't the same as "mfn != INVALID_MFN". You want to be precise >> here, >> to avoid confusion later on. (I say that knowing that we're still >> fighting >> especially shadow paging code on x86 not having those properly >> separated.) > If it is needed to be precise and mfn is valid isn't the same as "mfn > != INVALID_MFN" only for the case of shadow paging? No, I used shadow paging only as an example of where we have similar issues. I'd like to avoid that a new port starts out with introducing more instances of that. You want to properly separate INVALID_MFN from "invalid MFN", where the latter means any MFN where either nothing exists at all, or (see mfn_valid()) where no struct page_info exists. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |