[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 6/7] xen/riscv: page table handling



Missed to add revision log:

---
Changes in V8:
 - drop PTE_LEAF_MASK.
 - update the comment above pte_is_table(): drop table number and use
   just "the encoding of the permission bits".
 - declare pte_is_table() as static.
 - drop const from the argument of pte_is_table
 - drop the "const" comment before the argument of pte_is_mapping().
 - update the comment above ASSERT() in pte_is_mapping() to : See
pte_is_table().
 - drop "const" from the return type of get_root_page().
 - update the comment above "pt_check_entry()".
 - start the comment with capital letter.
 - update the way how PTE_ACCESS_MASK bits are cleared before being
updated by
   the value in flags.
 - use dprintk() instead of printk() in riscv/pt.c
 - introduce XEN_TABLE_MAP_NONE and XEN_TABLE_MAP_NOMEM instead of
XEN_TABLE_MAP_FAILED
   and correspondingly update part of the code of pt_next_level()'s
return value
   handling in pt_update_entry.
 - update type of virt to vaddr_t for pt_update_entry()
---
Changes in V7:
 - rename PTE_XWV_BITS to PTE_LEAF_MASK.
 - drop PTE_XWV_MASK, PTE_RWX_MASK.
 - introduce PTE_ACCESS_MASK.
 - update the ASSERT and the comment about it in pte_is_mapping().
 - add the same ASSERT as in pte_is_mapping() to pte_is_table().
 - update the comment above pte_is_table().
 - use PTE_ACCESS_MASK inside pte_is_{table,mapping} instead of
encoding
   access bit explicitly.
 - define SATP_PPN_MASK using SATP{32,64}_PPN.
 - drop inclusion of #include <xen/mm-frame.h> in riscv/pt.c as
xen/mm.h is
   included.
 - use pfn_to_paddr() in get_root_page() instead of open-coding of
pfn_to_paddr().
 - update if the comment and the if (...) in pt_update_entry() above
the check
   in case of pt_next_level() returns XEN_TABLE_MAP_FAILED.
 - update the the comment above pt_update(): drop unecessary mentioning
of INVALID_MFN
   and blanks inside parentheses.
 - drop "full stops" in printk().
 - correct the condition in ASSERT() in map_pages_to_xen().
 - clear permission bits before updating the permissions in
pt_update_entry().
---
Changes in V6:
 - update the commit message.
 - correct the comment above flush_tlb_range_va().
 - add PTE_READABLE to the check of pte.rwx permissions in
   pte_is_mapping().
 - s/printk/dprintk in pt_check_entry().
 - drop unnecessary ASSERTS() in pt_check_entry().
 - drop checking of PTE_VALID flags in /* Sanity check when removing
   a mapping */ because of the earlier check.
 - drop ASSERT(flags & PTE_POPULATE) in /* Sanity check when populating
the page-table */
   section as in the earlier if it is checked.
 - pt_next_level() changes:
   - invert if ( alloc_tbl ) condition.
   - drop local variable ret.
 - pt_update_entry() changes:
   - invert definition of alloc_tbl.
   - update the comment inside "if ( rc == XEN_TABLE_MAP_FAILED )".
   - drop else for mentioned above if (...).
   - clear some PTE flags before update.
 - s/xen_pt_lock/pt_lock
 - use PFN_DOWN() for vfn variable definition in pt_update().
 - drop definition of PTE_{R,W,X}_MASK.
 - introduce PTE_XWV_BITS and PTE_XWV_MASK() for convenience and use
them in if (...)
   in pt_update().
 - update the comment above pt_update().
 - change memset(&pte, 0x00, sizeof(pte)) to pte.pte = 0.
 - add the comment above pte_is_table().
 - add ASSERT in pte_is_mapping() to check the cases which are reserved
for future
   use.
---
Changes in V5:
 - s/xen_{un}map/{un}map
 - introduce PTE_SMALL instead of PTE_BLOCK.
 - update the comment above defintion of PTE_4K_PAGES.
 - code style fixes.
 - s/RV_STAGE1_MODE > SATP_MODE_SV48/RV_STAGE1_MODE > SATP_MODE_SV39
around
   DECLARE_OFFSETS macros.
 - change type of root_maddr from unsgined long to maddr_t.
 - drop duplicated check ( if (rc) break ) in pt_update() inside while
cycle.
 - s/1U/1UL
 - put 'spin_unlock(&xen_pt_lock);' ahead of TLB flush in pt_update().
 - update the commit message.
 - update the comment above ASSERT() in map_pages_to_xen() and also
update
   the check within ASSERT() to check that flags has PTE_VALID bit set.
 - update the comment above pt_update() function.
 - add the comment inside pt_check_entry().
 - update the TLB flushing region in pt_update().
 - s/alloc_only/alloc_tbl
---
Changes in V4:
 - update the commit message.
 - drop xen_ prefix for functions: xen_pt_update(),
xen_pt_mapping_level(),
   xen_pt_update_entry(), xen_pt_next_level(), xen_pt_check_entry().
 - drop 'select GENERIC_PT' for CONFIG_RISCV. There is no GENERIC_PT
anymore.
 - update implementation of flush_xen_tlb_range_va and
s/flush_xen_tlb_range_va/flush_tlb_range_va
 - s/pte_get_mfn/mfn_from_pte. Others similar definitions I decided not
to touch as
   they were introduced before and this patter of naming such type of
macros will be applied
   for newly introduced macros.
 - drop _PAGE_* definitions and use analogues of PTE_*.
 - introduce PTE_{W,X,R}_MASK and drop PAGE_{XN,W,X}_MASK. Also drop
_PAGE_{*}_BIT
 - introduce PAGE_HYPERVISOR_RX.
 - drop unused now l3_table_offset.
 - drop struct pt_t as it was used only for one function. If it will be
needed in the future
   pt_t will be re-introduced.
 - code styles fixes in pte_is_table(). drop level argument from t.
 - update implementation and prototype of pte_is_mapping().
 - drop level argument from pt_next_level().
 - introduce definition of SATP_PPN_MASK.
 - isolate PPN of CSR_SATP before shift by PAGE_SHIFT.
 - drop set_permission() functions as it is not used more then once.
 - update prototype of pt_check_entry(): drop level argument as it is
not used.
 - pt_check_entry():
   - code style fixes
   - update the sanity check when modifying an entry
   - update the sanity check when when removing a mapping.
 - s/read_only/alloc_only.
 - code style fixes for pt_next_level().
 - pt_update_entry() changes:
   - drop arch_level variable inisde pt_update_entry()
   - drop convertion near virt to paddr_t in DECLARE_OFFSETS(offsets,
virt);
   - pull out "goto out inside first 'for' cycle.
   - drop braces for 'if' cases which has only one line.
   - ident 'out' label with one blank.
   - update the comment above alloc_only and also definition to take
into
     account  that if pte population was requested or not.
   - drop target variable and rename arch_target argument of the
function to
     target.
 - pt_mapping_level() changes:
   - move the check if PTE_BLOCK should be mapped on the top of the
function.
   - change int i to unsigned int and update 'for' cycle
correspondingly.
 - update prototye of pt_update():
   - drop the comment  above nr_mfns and drop const to be consistent
with other
     arguments.
   - always flush TLB at the end of the function as non-present entries
can be put
     in the TLB.
   - add fence before TLB flush to ensure that PTEs are all updated
before flushing.
 - s/XEN_TABLE_NORMAL_PAGE/XEN_TABLE_NORMAL
 - add a check in map_pages_to_xen() the mfn is not INVALID_MFN.
 - add the comment on top of pt_update() how mfn = INVALID_MFN is
considered.
 - s/_PAGE_BLOCK/PTE_BLOCK.
 - add the comment with additional explanation for PTE_BLOCK.
 - drop defintion of FIRST_SIZE as it isn't used.
---
Changes in V3:
 - new patch. ( Technically it is reworked version of the generic
approach
   which I tried to suggest in the previous version )
---

~ Oleksii

On Fri, 2024-09-27 at 18:33 +0200, Oleksii Kurochko wrote:
> Implement map_pages_to_xen() which requires several
> functions to manage page tables and entries:
> - pt_update()
> - pt_mapping_level()
> - pt_update_entry()
> - pt_next_level()
> - pt_check_entry()
> 
> To support these operations, add functions for creating,
> mapping, and unmapping Xen tables:
> - create_table()
> - map_table()
> - unmap_table()
> 
> Introduce PTE_SMALL to indicate that 4KB mapping is needed
> and PTE_POPULATE.
> 
> In addition introduce flush_tlb_range_va() for TLB flushing across
> CPUs after updating the PTE for the requested mapping.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
> ---
>  xen/arch/riscv/Makefile                     |   1 +
>  xen/arch/riscv/include/asm/flushtlb.h       |   9 +
>  xen/arch/riscv/include/asm/mm.h             |   2 +
>  xen/arch/riscv/include/asm/page.h           |  80 ++++
>  xen/arch/riscv/include/asm/riscv_encoding.h |   2 +
>  xen/arch/riscv/mm.c                         |   9 -
>  xen/arch/riscv/pt.c                         | 421
> ++++++++++++++++++++
>  7 files changed, 515 insertions(+), 9 deletions(-)
>  create mode 100644 xen/arch/riscv/pt.c
> 
> diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> index 6832549133..a5eb2aed4b 100644
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
>  obj-y += entry.o
>  obj-y += mm.o
> +obj-y += pt.o
>  obj-$(CONFIG_RISCV_64) += riscv64/
>  obj-y += sbi.o
>  obj-y += setup.o
> diff --git a/xen/arch/riscv/include/asm/flushtlb.h
> b/xen/arch/riscv/include/asm/flushtlb.h
> index f4a735fd6c..43214f5e95 100644
> --- a/xen/arch/riscv/include/asm/flushtlb.h
> +++ b/xen/arch/riscv/include/asm/flushtlb.h
> @@ -5,12 +5,21 @@
>  #include <xen/bug.h>
>  #include <xen/cpumask.h>
>  
> +#include <asm/sbi.h>
> +
>  /* Flush TLB of local processor for address va. */
>  static inline void flush_tlb_one_local(vaddr_t va)
>  {
>      asm volatile ( "sfence.vma %0" :: "r" (va) : "memory" );
>  }
>  
> +/* Flush a range of VA's hypervisor mappings from the TLB of all
> processors. */
> +static inline void flush_tlb_range_va(vaddr_t va, size_t size)
> +{
> +    BUG_ON(!sbi_has_rfence());
> +    sbi_remote_sfence_vma(NULL, va, size);
> +}
> +
>  /*
>   * Filter the given set of CPUs, removing those that definitely
> flushed their
>   * TLB since @page_timestamp.
> diff --git a/xen/arch/riscv/include/asm/mm.h
> b/xen/arch/riscv/include/asm/mm.h
> index a0bdc2bc3a..ce1557bb27 100644
> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -42,6 +42,8 @@ static inline void *maddr_to_virt(paddr_t ma)
>  #define virt_to_mfn(va)     __virt_to_mfn(va)
>  #define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
>  
> +#define mfn_from_pte(pte) maddr_to_mfn(pte_to_paddr(pte))
> +
>  struct page_info
>  {
>      /* Each frame can be threaded onto a doubly-linked list. */
> diff --git a/xen/arch/riscv/include/asm/page.h
> b/xen/arch/riscv/include/asm/page.h
> index eb79cb9409..89fa290697 100644
> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -21,6 +21,11 @@
>  #define XEN_PT_LEVEL_MAP_MASK(lvl)  (~(XEN_PT_LEVEL_SIZE(lvl) - 1))
>  #define XEN_PT_LEVEL_MASK(lvl)      (VPN_MASK <<
> XEN_PT_LEVEL_SHIFT(lvl))
>  
> +/*
> + * PTE format:
> + * | XLEN-1  10 | 9             8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> + *       PFN      reserved for SW   D   A   G   U   X   W   R   V
> + */
>  #define PTE_VALID                   BIT(0, UL)
>  #define PTE_READABLE                BIT(1, UL)
>  #define PTE_WRITABLE                BIT(2, UL)
> @@ -34,15 +39,49 @@
>  #define PTE_LEAF_DEFAULT            (PTE_VALID | PTE_READABLE |
> PTE_WRITABLE)
>  #define PTE_TABLE                   (PTE_VALID)
>  
> +#define PAGE_HYPERVISOR_RO          (PTE_VALID | PTE_READABLE)
>  #define PAGE_HYPERVISOR_RW          (PTE_VALID | PTE_READABLE |
> PTE_WRITABLE)
> +#define PAGE_HYPERVISOR_RX          (PTE_VALID | PTE_READABLE |
> PTE_EXECUTABLE)
>  
>  #define PAGE_HYPERVISOR             PAGE_HYPERVISOR_RW
>  
> +/*
> + * The PTE format does not contain the following bits within itself;
> + * they are created artificially to inform the Xen page table
> + * handling algorithm. These bits should not be explicitly written
> + * to the PTE entry.
> + */
> +#define PTE_SMALL       BIT(10, UL)
> +#define PTE_POPULATE    BIT(11, UL)
> +
> +#define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE |
> PTE_EXECUTABLE)
> +
>  /* Calculate the offsets into the pagetables for a given VA */
>  #define pt_linear_offset(lvl, va)   ((va) >>
> XEN_PT_LEVEL_SHIFT(lvl))
>  
>  #define pt_index(lvl, va) (pt_linear_offset((lvl), (va)) & VPN_MASK)
>  
> +#define PAGETABLE_ORDER_MASK ((_AC(1, U) << PAGETABLE_ORDER) - 1)
> +#define TABLE_OFFSET(offs) (_AT(unsigned int, offs) &
> PAGETABLE_ORDER_MASK)
> +
> +#if RV_STAGE1_MODE > SATP_MODE_SV39
> +#error "need to to update DECLARE_OFFSETS macros"
> +#else
> +
> +#define l0_table_offset(va) TABLE_OFFSET(pt_linear_offset(0, va))
> +#define l1_table_offset(va) TABLE_OFFSET(pt_linear_offset(1, va))
> +#define l2_table_offset(va) TABLE_OFFSET(pt_linear_offset(2, va))
> +
> +/* Generate an array @var containing the offset for each level from
> @addr */
> +#define DECLARE_OFFSETS(var, addr)          \
> +    const unsigned int var[] = {            \
> +        l0_table_offset(addr),              \
> +        l1_table_offset(addr),              \
> +        l2_table_offset(addr),              \
> +    }
> +
> +#endif
> +
>  /* Page Table entry */
>  typedef struct {
>  #ifdef CONFIG_RISCV_64
> @@ -68,6 +107,47 @@ static inline bool pte_is_valid(pte_t p)
>      return p.pte & PTE_VALID;
>  }
>  
> +/*
> + * From the RISC-V spec:
> + *   The V bit indicates whether the PTE is valid; if it is 0, all
> other bits
> + *   in the PTE are don’t-cares and may be used freely by software.
> + *
> + *   If V=1 the encoding of PTE R/W/X bits could be find in "the
> encoding
> + *   of the permission bits" table.
> + *
> + *   The encoding of the permission bits table:
> + *      X W R Meaning
> + *      0 0 0 Pointer to next level of page table.
> + *      0 0 1 Read-only page.
> + *      0 1 0 Reserved for future use.
> + *      0 1 1 Read-write page.
> + *      1 0 0 Execute-only page.
> + *      1 0 1 Read-execute page.
> + *      1 1 0 Reserved for future use.
> + *      1 1 1 Read-write-execute page.
> + */
> +static inline bool pte_is_table(pte_t p)
> +{
> +    /*
> +     * According to the spec if V=1 and W=1 then R also needs to be
> 1 as
> +     * R = 0 is reserved for future use ( look at the Table 4.5 ) so
> check
> +     * in ASSERT that if (V==1 && W==1) then R isn't 0.
> +     *
> +     * PAGE_HYPERVISOR_RW contains PTE_VALID too.
> +     */
> +    ASSERT(((p.pte & PAGE_HYPERVISOR_RW) != (PTE_VALID |
> PTE_WRITABLE)));
> +
> +    return ((p.pte & (PTE_VALID | PTE_ACCESS_MASK)) == PTE_VALID);
> +}
> +
> +static inline bool pte_is_mapping(pte_t p)
> +{
> +    /* See pte_is_table() */
> +    ASSERT(((p.pte & PAGE_HYPERVISOR_RW) != (PTE_VALID |
> PTE_WRITABLE)));
> +
> +    return (p.pte & PTE_VALID) && (p.pte & PTE_ACCESS_MASK);
> +}
> +
>  static inline void invalidate_icache(void)
>  {
>      BUG_ON("unimplemented");
> diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h
> b/xen/arch/riscv/include/asm/riscv_encoding.h
> index 58abe5eccc..e31e94e77e 100644
> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> @@ -164,6 +164,7 @@
>  #define SSTATUS_SD                   SSTATUS64_SD
>  #define SATP_MODE                    SATP64_MODE
>  #define SATP_MODE_SHIFT                      SATP64_MODE_SHIFT
> +#define SATP_PPN_MASK                        SATP64_PPN
>  
>  #define HGATP_PPN                    HGATP64_PPN
>  #define HGATP_VMID_SHIFT             HGATP64_VMID_SHIFT
> @@ -174,6 +175,7 @@
>  #define SSTATUS_SD                   SSTATUS32_SD
>  #define SATP_MODE                    SATP32_MODE
>  #define SATP_MODE_SHIFT                      SATP32_MODE_SHIFT
> +#define SATP_PPN_MASK                        SATP32_PPN
>  
>  #define HGATP_PPN                    HGATP32_PPN
>  #define HGATP_VMID_SHIFT             HGATP32_VMID_SHIFT
> diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
> index b8ff91cf4e..e8430def14 100644
> --- a/xen/arch/riscv/mm.c
> +++ b/xen/arch/riscv/mm.c
> @@ -369,12 +369,3 @@ int destroy_xen_mappings(unsigned long s,
> unsigned long e)
>      BUG_ON("unimplemented");
>      return -1;
>  }
> -
> -int map_pages_to_xen(unsigned long virt,
> -                     mfn_t mfn,
> -                     unsigned long nr_mfns,
> -                     unsigned int flags)
> -{
> -    BUG_ON("unimplemented");
> -    return -1;
> -}
> diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
> new file mode 100644
> index 0000000000..a5552a4871
> --- /dev/null
> +++ b/xen/arch/riscv/pt.c
> @@ -0,0 +1,421 @@
> +#include <xen/bug.h>
> +#include <xen/domain_page.h>
> +#include <xen/errno.h>
> +#include <xen/lib.h>
> +#include <xen/mm.h>
> +#include <xen/pfn.h>
> +#include <xen/pmap.h>
> +#include <xen/spinlock.h>
> +
> +#include <asm/flushtlb.h>
> +#include <asm/page.h>
> +
> +static inline mfn_t get_root_page(void)
> +{
> +    paddr_t root_maddr = pfn_to_paddr(csr_read(CSR_SATP) &
> SATP_PPN_MASK);
> +
> +    return maddr_to_mfn(root_maddr);
> +}
> +
> +/*
> + * Sanity check a page table entry about to be updated as per an
> (MFN,flags)
> + * tuple.
> + * See the comment about the possible combination of (mfn, flags) in
> + * the comment above pt_update().
> + */
> +static bool pt_check_entry(pte_t entry, mfn_t mfn, unsigned int
> flags)
> +{
> +    /* Sanity check when modifying an entry. */
> +    if ( (flags & PTE_VALID) && mfn_eq(mfn, INVALID_MFN) )
> +    {
> +        /* We don't allow modifying an invalid entry. */
> +        if ( !pte_is_valid(entry) )
> +        {
> +            dprintk(XENLOG_ERR, "Modifying invalid entry is not
> allowed\n");
> +            return false;
> +        }
> +
> +        /* We don't allow modifying a table entry */
> +        if ( pte_is_table(entry) )
> +        {
> +            dprintk(XENLOG_ERR, "Modifying a table entry is not
> allowed\n");
> +            return false;
> +        }
> +    }
> +    /* Sanity check when inserting a mapping */
> +    else if ( flags & PTE_VALID )
> +    {
> +        /*
> +         * We don't allow replacing any valid entry.
> +         *
> +         * Note that the function pt_update() relies on this
> +         * assumption and will skip the TLB flush (when Svvptc
> +         * extension will be ratified). The function will need
> +         * to be updated if the check is relaxed.
> +         */
> +        if ( pte_is_valid(entry) )
> +        {
> +            if ( pte_is_mapping(entry) )
> +                dprintk(XENLOG_ERR, "Changing MFN for valid PTE is
> not allowed (%#"PRI_mfn" -> %#"PRI_mfn")\n",
> +                       mfn_x(mfn_from_pte(entry)), mfn_x(mfn));
> +            else
> +                dprintk(XENLOG_ERR, "Trying to replace table with
> mapping\n");
> +            return false;
> +        }
> +    }
> +    /* Sanity check when removing a mapping. */
> +    else if ( !(flags & PTE_POPULATE) )
> +    {
> +        /* We should be here with an invalid MFN. */
> +        ASSERT(mfn_eq(mfn, INVALID_MFN));
> +
> +        /* We don't allow removing a table */
> +        if ( pte_is_table(entry) )
> +        {
> +            dprintk(XENLOG_ERR, "Removing a table is not
> allowed\n");
> +            return false;
> +        }
> +    }
> +    /* Sanity check when populating the page-table. No check so far.
> */
> +    else
> +    {
> +        /* We should be here with an invalid MFN */
> +        ASSERT(mfn_eq(mfn, INVALID_MFN));
> +    }
> +
> +    return true;
> +}
> +
> +static pte_t *map_table(mfn_t mfn)
> +{
> +    /*
> +     * During early boot, map_domain_page() may be unusable. Use the
> +     * PMAP to map temporarily a page-table.
> +     */
> +    if ( system_state == SYS_STATE_early_boot )
> +        return pmap_map(mfn);
> +
> +    return map_domain_page(mfn);
> +}
> +
> +static void unmap_table(const pte_t *table)
> +{
> +    /*
> +     * During early boot, map_table() will not use map_domain_page()
> +     * but the PMAP.
> +     */
> +    if ( system_state == SYS_STATE_early_boot )
> +        pmap_unmap(table);
> +    else
> +        unmap_domain_page(table);
> +}
> +
> +static int create_table(pte_t *entry)
> +{
> +    mfn_t mfn;
> +    void *p;
> +    pte_t pte;
> +
> +    if ( system_state != SYS_STATE_early_boot )
> +    {
> +        struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +        if ( pg == NULL )
> +            return -ENOMEM;
> +
> +        mfn = page_to_mfn(pg);
> +    }
> +    else
> +        mfn = alloc_boot_pages(1, 1);
> +
> +    p = map_table(mfn);
> +    clear_page(p);
> +    unmap_table(p);
> +
> +    pte = pte_from_mfn(mfn, PTE_TABLE);
> +    write_pte(entry, pte);
> +
> +    return 0;
> +}
> +
> +#define XEN_TABLE_MAP_NONE 0
> +#define XEN_TABLE_MAP_NOMEM 1
> +#define XEN_TABLE_SUPER_PAGE 2
> +#define XEN_TABLE_NORMAL 3
> +
> +/*
> + * Take the currently mapped table, find the corresponding entry,
> + * and map the next table, if available.
> + *
> + * The alloc_tbl parameters indicates whether intermediate tables
> should
> + * be allocated when not present.
> + *
> + * Return values:
> + *  XEN_TABLE_MAP_FAILED: Either alloc_only was set and the entry
> + *  was empty, or allocating a new page failed.
> + *  XEN_TABLE_NORMAL: next level or leaf mapped normally
> + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
> + */
> +static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int
> offset)
> +{
> +    pte_t *entry;
> +    mfn_t mfn;
> +
> +    entry = *table + offset;
> +
> +    if ( !pte_is_valid(*entry) )
> +    {
> +        if ( !alloc_tbl )
> +            return XEN_TABLE_MAP_NONE;
> +
> +        if ( create_table(entry) )
> +            return XEN_TABLE_MAP_NOMEM;
> +    }
> +
> +    if ( pte_is_mapping(*entry) )
> +        return XEN_TABLE_SUPER_PAGE;
> +
> +    mfn = mfn_from_pte(*entry);
> +
> +    unmap_table(*table);
> +    *table = map_table(mfn);
> +
> +    return XEN_TABLE_NORMAL;
> +}
> +
> +/* Update an entry at the level @target. */
> +static int pt_update_entry(mfn_t root, vaddr_t virt,
> +                           mfn_t mfn, unsigned int target,
> +                           unsigned int flags)
> +{
> +    int rc;
> +    unsigned int level = HYP_PT_ROOT_LEVEL;
> +    pte_t *table;
> +    /*
> +     * The intermediate page table shouldn't be allocated when MFN
> isn't
> +     * valid and we are not populating page table.
> +     * This means we either modify permissions or remove an entry,
> or
> +     * inserting brand new entry.
> +     *
> +     * See the comment above pt_update() for an additional
> explanation about
> +     * combinations of (mfn, flags).
> +    */
> +    bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags &
> PTE_POPULATE);
> +    pte_t pte, *entry;
> +
> +    /* convenience aliases */
> +    DECLARE_OFFSETS(offsets, virt);
> +
> +    table = map_table(root);
> +    for ( ; level > target; level-- )
> +    {
> +        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> +        if ( rc == XEN_TABLE_MAP_NOMEM )
> +        {
> +            rc = -ENOMEM;
> +            goto out;
> +        }
> +
> +        if ( rc == XEN_TABLE_MAP_NONE )
> +        {
> +            rc = 0;
> +            goto out;
> +        }
> +
> +        if ( rc != XEN_TABLE_NORMAL )
> +            break;
> +    }
> +
> +    if ( level != target )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%s: Shattering superpage is not supported\n",
> __func__);
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    entry = table + offsets[level];
> +
> +    rc = -EINVAL;
> +    if ( !pt_check_entry(*entry, mfn, flags) )
> +        goto out;
> +
> +    /* We are removing the page */
> +    if ( !(flags & PTE_VALID) )
> +        /*
> +         * There is also a check in pt_check_entry() which check
> that
> +         * mfn=INVALID_MFN
> +         */
> +        pte.pte = 0;
> +    else
> +    {
> +        /* We are inserting a mapping => Create new pte. */
> +        if ( !mfn_eq(mfn, INVALID_MFN) )
> +            pte = pte_from_mfn(mfn, PTE_VALID);
> +        else /* We are updating the permission => Copy the current
> pte. */
> +        {
> +            pte = *entry;
> +            pte.pte &= ~PTE_ACCESS_MASK;
> +        }
> +
> +        /* update permission according to the flags */
> +        pte.pte |= (flags & PTE_ACCESS_MASK) | PTE_ACCESSED |
> PTE_DIRTY;
> +    }
> +
> +    write_pte(entry, pte);
> +
> +    rc = 0;
> +
> + out:
> +    unmap_table(table);
> +
> +    return rc;
> +}
> +
> +/* Return the level where mapping should be done */
> +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned
> long nr,
> +                            unsigned int flags)
> +{
> +    unsigned int level = 0;
> +    unsigned long mask;
> +    unsigned int i;
> +
> +    /*
> +     * Use a larger mapping than 4K unless the caller specifically
> requests
> +     * 4K mapping
> +     */
> +    if ( unlikely(flags & PTE_SMALL) )
> +        return level;
> +
> +    /*
> +     * Don't take into account the MFN when removing mapping (i.e
> +     * MFN_INVALID) to calculate the correct target order.
> +     *
> +     * `vfn` and `mfn` must be both superpage aligned.
> +     * They are or-ed together and then checked against the size of
> +     * each level.
> +     *
> +     * `left` ( variable declared in pt_update() ) is not included
> +     * and checked separately to allow superpage mapping even if it
> +     * is not properly aligned (the user may have asked to map 2MB +
> 4k).
> +     */
> +    mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
> +    mask |= vfn;
> +
> +    for ( i = HYP_PT_ROOT_LEVEL; i != 0; i-- )
> +    {
> +        if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) &&
> +             (nr >= BIT(XEN_PT_LEVEL_ORDER(i), UL)) )
> +        {
> +            level = i;
> +            break;
> +        }
> +    }
> +
> +    return level;
> +}
> +
> +static DEFINE_SPINLOCK(pt_lock);
> +
> +/*
> + * If `mfn` equals `INVALID_MFN`, it indicates that the following
> page table
> + * update operation might be related to either:
> + *   - populating the table (PTE_POPULATE will be set additionaly),
> + *   - destroying a mapping (PTE_VALID=0),
> + *   - modifying an existing mapping (PTE_VALID=1).
> + *
> + * If `mfn` != INVALID_MFN and flags has PTE_VALID bit set then it
> means that
> + * inserting will be done.
> + */
> +static int pt_update(vaddr_t virt, mfn_t mfn,
> +                     unsigned long nr_mfns, unsigned int flags)
> +{
> +    int rc = 0;
> +    unsigned long vfn = PFN_DOWN(virt);
> +    unsigned long left = nr_mfns;
> +    const mfn_t root = get_root_page();
> +
> +    /*
> +     * It is bad idea to have mapping both writeable and
> +     * executable.
> +     * When modifying/creating mapping (i.e PTE_VALID is set),
> +     * prevent any update if this happen.
> +     */
> +    if ( (flags & PTE_VALID) && (flags & PTE_WRITABLE) &&
> +         (flags & PTE_EXECUTABLE) )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "Mappings should not be both Writeable and
> Executable\n");
> +        return -EINVAL;
> +    }
> +
> +    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "The virtual address is not aligned to the page-
> size\n");
> +        return -EINVAL;
> +    }
> +
> +    spin_lock(&pt_lock);
> +
> +    while ( left )
> +    {
> +        unsigned int order, level;
> +
> +        level = pt_mapping_level(vfn, mfn, left, flags);
> +        order = XEN_PT_LEVEL_ORDER(level);
> +
> +        ASSERT(left >= BIT(order, UL));
> +
> +        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level,
> flags);
> +        if ( rc )
> +            break;
> +
> +        vfn += 1UL << order;
> +        if ( !mfn_eq(mfn, INVALID_MFN) )
> +            mfn = mfn_add(mfn, 1UL << order);
> +
> +        left -= (1UL << order);
> +    }
> +
> +    /* Ensure that PTEs are all updated before flushing */
> +    RISCV_FENCE(rw, rw);
> +
> +    spin_unlock(&pt_lock);
> +
> +    /*
> +     * Always flush TLB at the end of the function as non-present
> entries
> +     * can be put in the TLB.
> +     *
> +     * The remote fence operation applies to the entire address
> space if
> +     * either:
> +     *  - start and size are both 0, or
> +     *  - size is equal to 2^XLEN-1.
> +     *
> +     * TODO: come up with something which will allow not to flash
> the entire
> +     *       address space.
> +     */
> +    flush_tlb_range_va(0, 0);
> +
> +    return rc;
> +}
> +
> +int map_pages_to_xen(unsigned long virt,
> +                     mfn_t mfn,
> +                     unsigned long nr_mfns,
> +                     unsigned int flags)
> +{
> +    /*
> +     * Ensure that flags has PTE_VALID bit as map_pages_to_xen() is
> supposed
> +     * to create a mapping.
> +     *
> +     * Ensure that we have a valid MFN before proceeding.
> +     *
> +     * If the MFN is invalid, pt_update() might misinterpret the
> operation,
> +     * treating it as either a population, a mapping destruction,
> +     * or a mapping modification.
> +     */
> +    ASSERT(!mfn_eq(mfn, INVALID_MFN) && (flags & PTE_VALID));
> +
> +    return pt_update(virt, mfn, nr_mfns, flags);
> +}




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.