[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] xen/riscv: introduce setup_initial_pages



On Mon, 2023-03-20 at 17:41 +0100, Jan Beulich wrote:
> On 16.03.2023 17:43, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/mm.h
> > @@ -0,0 +1,8 @@
> > +#ifndef _ASM_RISCV_MM_H
> > +#define _ASM_RISCV_MM_H
> > +
> > +void setup_initial_pagetables(void);
> > +
> > +extern void enable_mmu(void);
> 
> Nit: At least within a single header you probably want to be
> consistent
> about the use of "extern". Personally I think it would better be
> omitted
> from function declarations.
Agree. It would be better to remove extern.

> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/page.h
> > @@ -0,0 +1,67 @@
> > +#ifndef _ASM_RISCV_PAGE_H
> > +#define _ASM_RISCV_PAGE_H
> > +
> > +#include <xen/const.h>
> > +#include <xen/types.h>
> > +
> > +#define PAGE_ENTRIES                (1 << PAGETABLE_ORDER)
> > +#define VPN_MASK                    ((unsigned long)(PAGE_ENTRIES
> > - 1))
> > +
> > +#define PAGE_ORDER                  (12)
> 
> DYM PAGE_SHIFT here, as used elsewhere in Xen?
yes, PAGE_SHIFT is meant here.

> 
> Also are you aware of page-bits.h, where I think some of these
> constants
> should go?
I'll move some constants to page-bits.h

> 
> > +#ifdef CONFIG_RISCV_64
> > +#define PAGETABLE_ORDER             (9)
> > +#else /* CONFIG_RISCV_32 */
> > +#define PAGETABLE_ORDER             (10)
> > +#endif
> > +
> > +#define LEVEL_ORDER(lvl)            (lvl * PAGETABLE_ORDER)
> > +#define LEVEL_SHIFT(lvl)            (LEVEL_ORDER(lvl) +
> > PAGE_ORDER)
> > +#define LEVEL_SIZE(lvl)             (_AT(paddr_t, 1) <<
> > LEVEL_SHIFT(lvl))
> > +
> > +#define XEN_PT_LEVEL_SHIFT(lvl)     LEVEL_SHIFT(lvl)
> > +#define XEN_PT_LEVEL_ORDER(lvl)     LEVEL_ORDER(lvl)
> > +#define XEN_PT_LEVEL_SIZE(lvl)      LEVEL_SIZE(lvl)
> 
> Mind me asking what these are good for? Doesn't one set of macros
> suffice?
Do you mean XEN_PT_LEVEL_{SHIFT...}? it can be used only one pair of
macros. I'll check how they are used in ARM ( I copied that macros from
there ).
> 
> > +#define XEN_PT_LEVEL_MAP_MASK(lvl)  (~(XEN_PT_LEVEL_SIZE(lvl) -
> > 1))
> > +#define XEN_PT_LEVEL_MASK(lvl)      (VPN_MASK <<
> > XEN_PT_LEVEL_SHIFT(lvl))
> > +
> > +#define PTE_SHIFT                   10
> 
> What does this describe? According to its single use here it may
> simply require a better name.
If look at Sv39 page table entry in riscv-priviliged.pdf. It has the
following description:
  63 62 61  60    54  53  28 27  19 18  10 9 8 7 6 5 4 3 2 1 0
  N  PBMT   Rererved  PPN[2] PPN[1] PPN[0] RSW D A G U X W R V
So 10 it means the 0-9 bits.

> 
> > +#define PTE_VALID                   BIT(0, UL)
> > +#define PTE_READABLE                BIT(1, UL)
> > +#define PTE_WRITABLE                BIT(2, UL)
> > +#define PTE_EXECUTABLE              BIT(3, UL)
> > +#define PTE_USER                    BIT(4, UL)
> > +#define PTE_GLOBAL                  BIT(5, UL)
> > +#define PTE_ACCESSED                BIT(6, UL)
> > +#define PTE_DIRTY                   BIT(7, UL)
> > +#define PTE_RSW                     (BIT(8, UL) | BIT(9, UL))
> > +
> > +#define PTE_LEAF_DEFAULT            (PTE_VALID | PTE_READABLE |
> > PTE_EXECUTABLE)
> > +#define PTE_TABLE                   (PTE_VALID)
> > +
> > +/* Calculate the offsets into the pagetables for a given VA */
> > +#define pt_linear_offset(lvl, va)   ((va) >>
> > XEN_PT_LEVEL_SHIFT(lvl))
> > +
> > +#define pt_index(lvl, va)   pt_linear_offset(lvl, (va) &
> > XEN_PT_LEVEL_MASK(lvl))
> > +
> > +/* Page Table entry */
> > +typedef struct {
> > +    uint64_t pte;
> > +} pte_t;
> 
> Not having read the respective spec (yet) I'm wondering if this
> really
> is this way also for RV32 (despite the different PAGETABLE_ORDER).
RISC-V architecture support several MMU mode to translate VA to PA.
The following mode can be: Sv32, Sv39, Sv48, Sv57 and PAGESIZE is equal
to 8 in all cases except Sv32 ( it is equal to 4 ). but I looked at
different big projects who have RISC-V support and no one supports
Sv32.

So it will be correct for RV32 as it supports the same Sv32 and Sv39
modes too.
> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/mm.c
> > @@ -0,0 +1,121 @@
> > +#include <xen/compiler.h>
> > +#include <xen/init.h>
> > +#include <xen/kernel.h>
> > +#include <xen/lib.h>
> > +#include <xen/page-size.h>
> > +
> > +#include <asm/boot-info.h>
> > +#include <asm/config.h>
> > +#include <asm/csr.h>
> > +#include <asm/mm.h>
> > +#include <asm/page.h>
> > +#include <asm/traps.h>
> > +
> > +/*
> > + * xen_second_pagetable is indexed with the VPN[2] page table
> > entry field
> > + * xen_first_pagetable is accessed from the VPN[1] page table
> > entry field
> > + * xen_zeroeth_pagetable is accessed from the VPN[0] page table
> > entry field
> > + */
> > +pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > +    xen_second_pagetable[PAGE_ENTRIES];
> > +static pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > +    xen_first_pagetable[PAGE_ENTRIES];
> > +static pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > +    xen_zeroeth_pagetable[PAGE_ENTRIES];
> 
> I would assume Andrew's comment on the earlier version also extended
> to
> the names used here (and then also to various local variables or
> function
> parameters further down).
Thanks for clarification. It looks like I misunderstood him.

> 
> > +extern unsigned long __init_begin[];
> > +extern unsigned long __init_end[];
> > +extern unsigned char cpu0_boot_stack[STACK_SIZE];
> > +
> > +static void __init
> > +setup_initial_mapping(pte_t *second, pte_t *first, pte_t *zeroeth,
> > +                      unsigned long map_start,
> > +                      unsigned long map_end,
> > +                      unsigned long pa_start,
> > +                      unsigned long flags)
> > +{
> > +    unsigned long page_addr;
> > +
> > +    // /* align start addresses */
> > +    // map_start &= XEN_PT_LEVEL_MAP_MASK(0);
> > +    // pa_start &= XEN_PT_LEVEL_MAP_MASK(0);
> 
> It's not clear what this is about, but in any event the comment is
> malformed.
I forgot to remove so it should be removed and it wasn't supposed to be
in the final code after the comment for patch v1.
> 
> > +    page_addr = map_start;
> > +    while ( page_addr < map_end )
> > +    {
> > +        unsigned long index2 = pt_index(2, page_addr);
> > +        unsigned long index1 = pt_index(1, page_addr);
> > +        unsigned long index0 = pt_index(0, page_addr);
> > +
> > +        /* Setup level2 table */
> > +        second[index2] = paddr_to_pte((unsigned long)first);
> > +        second[index2].pte |= PTE_TABLE;
> > +
> > +        /* Setup level1 table */
> > +        first[index1] = paddr_to_pte((unsigned long)zeroeth);
> > +        first[index1].pte |= PTE_TABLE;
> 
> Whould it make sense to have paddr_to_pte() take attributes right
> away as 2nd argument?
Yes, it makes sense. It is always .pte |= FLAG after paddr_to_pte().
I'll update the macros.
> 
> > +
> > +
> 
> Nit: No double blank lines please.
Sure. Thanks.

> 
> > +        /* Setup level0 table */
> > +        if ( !pte_is_valid(&zeroeth[index0]) )
> > +        {
> > +            /* Update level0 table */
> > +            zeroeth[index0] = paddr_to_pte((page_addr - map_start)
> > + pa_start);
> > +            zeroeth[index0].pte |= flags;
> > +        }
> > +
> > +        /* Point to next page */
> > +        page_addr += XEN_PT_LEVEL_SIZE(0);
> > +    }
> > +}
> > +
> > +/*
> > + * setup_initial_pagetables:
> > + *
> > + * Build the page tables for Xen that map the following:
> > + *   load addresses to linker addresses
> > + */
> > +void __init setup_initial_pagetables(void)
> > +{
> > +    pte_t *second;
> > +    pte_t *first;
> > +    pte_t *zeroeth;
> > +
> > +    unsigned long load_addr_start   = boot_info.load_start;
> > +    unsigned long load_addr_end     = boot_info.load_end;
> > +    unsigned long linker_addr_start = boot_info.linker_start;
> > +    unsigned long linker_addr_end   = boot_info.linker_end;
> > +
> > +    BUG_ON(load_addr_start % (PAGE_ENTRIES * PAGE_SIZE) != 0);
> > +    if (load_addr_start != linker_addr_start)
> 
> Nit: Style (missing blanks).
Thanks for noticing.

> 
> > +        BUG_ON((linker_addr_end > load_addr_start && load_addr_end
> > > linker_addr_start));
> > +
> > +    /* Get the addresses where the page tables were loaded */
> > +    second  = (pte_t *)(&xen_second_pagetable);
> > +    first   = (pte_t *)(&xen_first_pagetable);
> > +    zeroeth = (pte_t *)(&xen_zeroeth_pagetable);
> 
> Please avoid casts whenever possible.
I'll take into account during the work on new patch version. Thanks.
> 
> > @@ -32,3 +33,67 @@ ENTRY(start)
> >          add     sp, sp, t0
> >  
> >          tail    start_xen
> > +
> > +ENTRY(enable_mmu)
> > +        /* Calculate physical offset between linker and load
> > addresses */
> > +        la      t0, boot_info
> > +        REG_L   t1, BI_LINKER_START(t0)
> > +        REG_L   t2, BI_LOAD_START(t0)
> > +        sub     t1, t1, t2
> > +
> > +        /*
> > +         * Calculate and update a linker time address of the
> > .L_mmu_is_enabled
> > +         * label and update CSR_STVEC with it.
> > +         * MMU is configured in a way where linker addresses are
> > mapped
> > +         * on load addresses so case when linker addresses are not
> > equal to
> > +         * load addresses, and thereby, after MMU is enabled, it
> > will cause
> > +         * an exception and jump to linker time addresses
> > +         */
> > +        la      t3, .L_mmu_is_enabled
> > +        add     t3, t3, t1
> > +        csrw    CSR_STVEC, t3
> > +
> > +        /* Calculate a value for SATP register */
> > +        li      t5, SATP_MODE_SV39
> > +        li      t6, SATP_MODE_SHIFT
> > +        sll     t5, t5, t6
> > +
> > +        la      t4, xen_second_pagetable
> > +        srl     t4, t4, PAGE_SHIFT
> > +        or      t4, t4, t5
> > +        sfence.vma
> > +        csrw    CSR_SATP, t4
> > +
> > +        .align 2
> > +.L_mmu_is_enabled:
> > +        /*
> > +         * Stack should be re-inited as:
> > +         * 1. Right now an address of the stack is relative to
> > load time
> > +         *    addresses what will cause an issue in case of load
> > start address
> > +         *    isn't equal to linker start address.
> > +         * 2. Addresses in stack are all load time relative which
> > can be an
> > +         *    issue in case when load start address isn't equal to
> > linker
> > +         *    start address.
> > +         */
> > +        la      sp, cpu0_boot_stack
> > +        li      t0, STACK_SIZE
> > +        add     sp, sp, t0
> > +
> > +        /*
> > +         * Re-init an address of exception handler as it was
> > overwritten  with
> > +         * the address of the .L_mmu_is_enabled label.
> > +         * Before jump to trap_init save return address of
> > enable_mmu() to
> > +         * know where we should back after enable_mmu() will be
> > finished.
> > +         */
> > +        mv      s0, ra
> 
> Don't you need to preserve s0 for your caller?
Yeah, it is needed according to RISC-V ABI spec. Thanks.
> 
> > +        lla     t0, trap_init
> 
> Any reason for lla here when elsewhere above you use la? And aren't
There is no any reason for that after patches which resolve an issue
with la pseudo-instruction were introduced:
https://lore.kernel.org/xen-devel/cover.1678970065.git.oleksii.kurochko@xxxxxxxxx/
> ...
> 
> > +        jalr    ra, t0
> 
> ... these two together "call" anyway?
I think it can be replaced with "call".
> 
> > +        /*
> > +         * Re-calculate the return address of enable_mmu()
> > function for case
> > +         * when linker start address isn't equal to load start
> > address
> > +         */
> > +        add     s0, s0, t1
> > +        mv      ra, s0
> 
> "add ra, s0, t1"?
> 
> But then - can't t1 be clobbered by trap_init()?
Yeah, it can. I didn't think about that.
Thanks. I'll update that in new version of the patch series.
> 
> > --- a/xen/arch/riscv/xen.lds.S
> > +++ b/xen/arch/riscv/xen.lds.S
> > @@ -179,3 +179,5 @@ SECTIONS
> >  
> >  ASSERT(!SIZEOF(.got),      ".got non-empty")
> >  ASSERT(!SIZEOF(.got.plt),  ".got.plt non-empty")
> > +
> > +ASSERT(_end - _start <= MB(2), "Xen too large for early-boot
> > assumptions")
> 
> Again the question whether this is also applicable to RV32.
As mentioned before it will applicable to RV32 only in  case if Sv32
won't be used as addressing mode. In that case it should 2^10 * 2^12 =
4 Mb.
But as I mentioned before I am not sure that we will support Sv32.
Probably it will be good to add #ifdef...

~ Oleksii

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.