|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [for 4.22 v5 10/18] xen/riscv: implement p2m_set_range()
On 20.10.2025 17:57, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/p2m.h
> +++ b/xen/arch/riscv/include/asm/p2m.h
> @@ -8,12 +8,45 @@
> #include <xen/rwlock.h>
> #include <xen/types.h>
>
> +#include <asm/page.h>
> #include <asm/page-bits.h>
>
> extern unsigned char gstage_mode;
> +extern unsigned int gstage_root_level;
>
> #define P2M_ROOT_ORDER (ilog2(GSTAGE_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT)
> #define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U)
> +#define P2M_ROOT_LEVEL gstage_root_level
> +
> +/*
> + * According to the RISC-V spec:
> + * When hgatp.MODE specifies a translation scheme of Sv32x4, Sv39x4,
> Sv48x4,
> + * or Sv57x4, G-stage address translation is a variation on the usual
> + * page-based virtual address translation scheme of Sv32, Sv39, Sv48, or
> + * Sv57, respectively. In each case, the size of the incoming address is
> + * widened by 2 bits (to 34, 41, 50, or 59 bits).
> + *
> + * P2M_LEVEL_ORDER(lvl) defines the bit position in the GFN from which
> + * the index for this level of the P2M page table starts. The extra 2
> + * bits added by the "x4" schemes only affect the root page table width.
> + *
> + * Therefore, this macro can safely reuse XEN_PT_LEVEL_ORDER() for all
> + * levels: the extra 2 bits do not change the indices of lower levels.
> + *
> + * The extra 2 bits are only relevant if one tried to address beyond the
> + * root level (i.e., P2M_LEVEL_ORDER(P2M_ROOT_LEVEL + 1)), which is
> + * invalid.
> + */
> +#define P2M_LEVEL_ORDER(lvl) XEN_PT_LEVEL_ORDER(lvl)
Is the last paragraph of the comment really needed? It talks about something
absurd / impossible only.
> +#define P2M_ROOT_EXTRA_BITS(lvl) (2 * ((lvl) == P2M_ROOT_LEVEL))
> +
> +#define P2M_PAGETABLE_ENTRIES(lvl) \
> + (BIT(PAGETABLE_ORDER + P2M_ROOT_EXTRA_BITS(lvl), UL))
> +
> +#define GFN_MASK(lvl) (P2M_PAGETABLE_ENTRIES(lvl) - 1UL)
If I'm not mistaken, this is a mask with the low 10 or 12 bits set.
That's not really something you can apply to a GFN, unlike the name
suggests.
> +#define P2M_LEVEL_SHIFT(lvl) (P2M_LEVEL_ORDER(lvl) + PAGE_SHIFT)
Whereas here the macro name doesn't make clear what is shifted: An
address or a GFN. (It's the former, aiui.)
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -9,6 +9,7 @@
> #include <xen/rwlock.h>
> #include <xen/sched.h>
> #include <xen/sections.h>
> +#include <xen/xvmalloc.h>
>
> #include <asm/csr.h>
> #include <asm/flushtlb.h>
> @@ -17,6 +18,43 @@
> #include <asm/vmid.h>
>
> unsigned char __ro_after_init gstage_mode;
> +unsigned int __ro_after_init gstage_root_level;
Like for mode, I'm unconvinced of this being a global (and not per-P2M /
per-domain).
> +/*
> + * The P2M root page table is extended by 2 bits, making its size 16KB
> + * (instead of 4KB for non-root page tables). Therefore, P2M root page
> + * is allocated as four consecutive 4KB pages (since alloc_domheap_pages()
> + * only allocates 4KB pages).
> + */
> +#define ENTRIES_PER_ROOT_PAGE \
> + (P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL) / P2M_ROOT_ORDER)
> +
> +static inline unsigned int calc_offset(unsigned int lvl, vaddr_t va)
Where would a vaddr_t come from here? Your input are guest-physical addresses,
if I'm not mistaken.
> +{
> + unsigned int offset = (va >> P2M_LEVEL_SHIFT(lvl)) & GFN_MASK(lvl);
> +
> + /*
> + * For P2M_ROOT_LEVEL, `offset` ranges from 0 to 2047, since the root
> + * page table spans 4 consecutive 4KB pages.
> + * We want to return an index within one of these 4 pages.
> + * The specific page to use is determined by `p2m_get_root_pointer()`.
> + *
> + * Example: if `offset == 512`:
> + * - A single 4KB page holds 512 entries.
> + * - Therefore, entry 512 corresponds to index 0 of the second page.
> + *
> + * At all other levels, only one page is allocated, and `offset` is
> + * always in the range 0 to 511, since the VPN is 9 bits long.
> + */
> + return offset % ENTRIES_PER_ROOT_PAGE;
Seeing something "root" used here (when this is for all levels) is pretty odd,
despite all the commentary. Given all the commentary, why not simply
return offset & ((1U << PAGETABLE_ORDER) - 1);
?
> +}
> +
> +#define P2M_MAX_ROOT_LEVEL 4
> +
> +#define P2M_DECLARE_OFFSETS(var, addr) \
> + unsigned int var[P2M_MAX_ROOT_LEVEL] = {-1};\
> + for ( unsigned int i = 0; i <= gstage_root_level; i++ ) \
> + var[i] = calc_offset(i, addr);
This surely is more than just "declare", and it's dealing with all levels no
matter whether you actually will use all offsets.
> @@ -259,13 +308,293 @@ int p2m_set_allocation(struct domain *d, unsigned long
> pages, bool *preempted)
> return rc;
> }
>
> +/*
> + * Map one of the four root pages of the P2M root page table.
> + *
> + * The P2M root page table is larger than normal (16KB instead of 4KB),
> + * so it is allocated as four consecutive 4KB pages. This function selects
> + * the appropriate 4KB page based on the given GFN and returns a mapping
> + * to it.
> + *
> + * The caller is responsible for unmapping the page after use.
> + *
> + * Returns NULL if the calculated offset into the root table is invalid.
> + */
> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
> +{
> + unsigned long root_table_indx;
> +
> + root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
With the variable name shortened (to e.g. idx) this could be its initializer
without ending up with too long a line. The root_table_ prefix isn't really
adding much value in the context of this function.
> + if ( root_table_indx >= P2M_ROOT_PAGES )
> + return NULL;
> +
> + /*
> + * The P2M root page table is extended by 2 bits, making its size 16KB
> + * (instead of 4KB for non-root page tables). Therefore, p2m->root is
> + * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
> + * only allocates 4KB pages).
> + *
> + * Initially, `root_table_indx` is derived directly from `va`.
There's no 'va' here.
> +static inline void p2m_clean_pte(pte_t *p, bool clean_pte)
"clean_pte" as a parameter name of a function of this name is, well, odd.
> +/* Insert an entry in the p2m */
> +static int p2m_set_entry(struct p2m_domain *p2m,
> + gfn_t gfn,
> + unsigned long page_order,
> + mfn_t mfn,
> + p2m_type_t t)
> +{
> + unsigned int level;
> + unsigned int target = page_order / PAGETABLE_ORDER;
> + pte_t *entry, *table, orig_pte;
> + int rc;
> + /*
> + * A mapping is removed only if the MFN is explicitly set to INVALID_MFN.
> + * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
> + * are still allowed.
> + */
> + bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
> + P2M_DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
> +
> + ASSERT(p2m_is_write_locked(p2m));
> +
> + /*
> + * Check if the level target is valid: we only support
> + * 4K - 2M - 1G mapping.
> + */
> + ASSERT(target <= 2);
> +
> + table = p2m_get_root_pointer(p2m, gfn);
> + if ( !table )
> + return -EINVAL;
> +
> + for ( level = P2M_ROOT_LEVEL; level > target; level-- )
> + {
> + /*
> + * Don't try to allocate intermediate page table if the mapping
> + * is about to be removed.
> + */
> + rc = p2m_next_level(p2m, !removing_mapping,
> + level, &table, offsets[level]);
> + if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
> + {
> + rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
> + /*
> + * We are here because p2m_next_level has failed to map
> + * the intermediate page table (e.g the table does not exist
> + * and none should be allocated). It is a valid case
> + * when removing a mapping as it may not exist in the
> + * page table. In this case, just ignore lookup failure.
> + */
> + rc = removing_mapping ? 0 : rc;
> + goto out;
> + }
> +
> + if ( rc != P2M_TABLE_NORMAL )
> + break;
> + }
> +
> + entry = table + offsets[level];
> +
> + /*
> + * If we are here with level > target, we must be at a leaf node,
> + * and we need to break up the superpage.
> + */
> + if ( level > target )
> + {
> + panic("Shattering isn't implemented\n");
> + }
> +
> + /*
> + * We should always be there with the correct level because all the
> + * intermediate tables have been installed if necessary.
> + */
> + ASSERT(level == target);
> +
> + orig_pte = *entry;
> +
> + if ( removing_mapping )
> + p2m_clean_pte(entry, p2m->clean_dcache);
> + else
> + {
> + pte_t pte = p2m_pte_from_mfn(mfn, t);
> +
> + p2m_write_pte(entry, pte, p2m->clean_dcache);
> +
> + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
> + gfn_add(gfn, BIT(page_order, UL) - 1));
> + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn);
> + }
> +
> + p2m->need_flush = true;
> +
> + /*
> + * Currently, the infrastructure required to enable
> CONFIG_HAS_PASSTHROUGH
> + * is not ready for RISC-V support.
> + *
> + * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done
> + * here.
> + */
> +#ifdef CONFIG_HAS_PASSTHROUGH
> +# error "add code to flush IOMMU TLB"
> +#endif
> +
> + rc = 0;
> +
> + /*
> + * In case of a VALID -> INVALID transition, the original PTE should
> + * always be freed.
> + *
> + * In case of a VALID -> VALID transition, the original PTE should be
> + * freed only if the MFNs are different. If the MFNs are the same
> + * (i.e., only permissions differ), there is no need to free the
> + * original PTE.
> + */
> + if ( pte_is_valid(orig_pte) &&
> + (!pte_is_valid(*entry) ||
> + !mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte))) )
Besides my continued impression of this condition being more complex than it
ought to be expected, indentation is off by one on the last of the three lines.
(Since, otoh, I can't suggest any simpler expression (for now), this isn't a
request to further change it.)
> +/* Return mapping order for given gfn, mfn and nr */
> +static unsigned long p2m_mapping_order(gfn_t gfn, mfn_t mfn, unsigned long
> nr)
> +{
> + unsigned long mask;
> + /* 1gb, 2mb, 4k mappings are supported */
> + unsigned int level = min(P2M_ROOT_LEVEL, _AC(2, U));
Further up you has such a literal 2 already - please make a constant, so all
instances can easily be associated with one another.
> + unsigned long order = 0;
> +
> + mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
> + mask |= gfn_x(gfn);
> +
> + for ( ; level != 0; level-- )
> + {
> + if ( !(mask & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)) &&
> + (nr >= BIT(P2M_LEVEL_ORDER(level), UL)) )
> + {
> + order = P2M_LEVEL_ORDER(level);
> + break;
I'm pretty sure I did complain about the too deep indentation here already.
> + }
> + }
> +
> + return order;
> +}
> +
> static int p2m_set_range(struct p2m_domain *p2m,
> gfn_t sgfn,
> unsigned long nr,
> mfn_t smfn,
> p2m_type_t t)
> {
> - return -EOPNOTSUPP;
> + int rc = 0;
> + unsigned long left = nr;
> +
> + /*
> + * Any reference taken by the P2M mappings (e.g. foreign mapping) will
> + * be dropped in relinquish_p2m_mapping(). As the P2M will still
> + * be accessible after, we need to prevent mapping to be added when the
> + * domain is dying.
> + */
> + if ( unlikely(p2m->domain->is_dying) )
> + return -EACCES;
> +
> + while ( left )
> + {
> + unsigned long order = p2m_mapping_order(sgfn, smfn, left);
> +
> + rc = p2m_set_entry(p2m, sgfn, order, smfn, t);
> + if ( rc )
> + break;
> +
> + sgfn = gfn_add(sgfn, BIT(order, UL));
> + if ( !mfn_eq(smfn, INVALID_MFN) )
> + smfn = mfn_add(smfn, BIT(order, UL));
Off-by-1 indentation again.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |