[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages
On Sat, 2023-02-25 at 17:53 +0000, Julien Grall wrote: > Hi Oleksii, > > On 24/02/2023 15:06, Oleksii Kurochko wrote: > > Mostly the code for setup_initial_pages was taken from Bobby's > > repo except for the following changes: > > * Use only a minimal part of the code enough to enable MMU > > * rename {_}setup_initial_pagetables functions > > * add writable argument for _setup_initial_pagetables to have > > an opportunity to make some sections read-only > > * update setup_initial_pagetables function to make some sections > > read-only > > * change the order of _setup_inital_pagetables() > > in setup_initial_pagetable(): > > * first it is called for text, init, rodata sections > > * after call it for ranges [link_addr_start : link_addr_end] and > > [load_addr_start : load_addr_end] > > Before it was done first for the ranges and after for sections > > but > > in that case read-only status will be equal to 'true' and > > as sections' addresses can/are inside the ranges the read-only > > status > > won't be updated for them as it was set up before. > > > > Origin: > > https://gitlab.com/xen-on-risc-v/xen/-/tree/riscv-rebase 4af165b468 > > af > > Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx> > > --- > > xen/arch/riscv/Makefile | 1 + > > xen/arch/riscv/include/asm/mm.h | 9 ++ > > xen/arch/riscv/include/asm/page.h | 90 ++++++++++++ > > xen/arch/riscv/mm.c | 223 > > ++++++++++++++++++++++++++++++ > > 4 files changed, 323 insertions(+) > > create mode 100644 xen/arch/riscv/include/asm/mm.h > > create mode 100644 xen/arch/riscv/include/asm/page.h > > create mode 100644 xen/arch/riscv/mm.c > > > > diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile > > index 443f6bf15f..956ceb02df 100644 > > --- a/xen/arch/riscv/Makefile > > +++ b/xen/arch/riscv/Makefile > > @@ -1,5 +1,6 @@ > > obj-$(CONFIG_EARLY_PRINTK) += early_printk.o > > obj-y += entry.o > > +obj-y += mm.o > > obj-$(CONFIG_RISCV_64) += riscv64/ > > obj-y += sbi.o > > obj-y += setup.o > > diff --git a/xen/arch/riscv/include/asm/mm.h > > b/xen/arch/riscv/include/asm/mm.h > > new file mode 100644 > > index 0000000000..fc1866b1d8 > > --- /dev/null > > +++ b/xen/arch/riscv/include/asm/mm.h > > @@ -0,0 +1,9 @@ > > +#ifndef _ASM_RISCV_MM_H > > +#define _ASM_RISCV_MM_H > > + > > +void setup_initial_pagetables(unsigned long load_addr_start, > > + unsigned long load_addr_end, > > + unsigned long linker_addr_start, > > + unsigned long linker_addr_end); > > + > > +#endif /* _ASM_RISCV_MM_H */ > > diff --git a/xen/arch/riscv/include/asm/page.h > > b/xen/arch/riscv/include/asm/page.h > > new file mode 100644 > > index 0000000000..fabbe1305f > > --- /dev/null > > +++ b/xen/arch/riscv/include/asm/page.h > > @@ -0,0 +1,90 @@ > > +#ifndef _ASM_RISCV_PAGE_H > > +#define _ASM_RISCV_PAGE_H > > + > > +#include <xen/const.h> > > +#include <xen/types.h> > > + > > +#define PAGE_ENTRIES 512 > > NIT: AFAIU, the number here is based on ... > > > +#define VPN_BITS (9) > > ... this. So I would suggest to define PAGE_ENTRIES using VPN_BITS. Sure. It should be defined using VPN_BITS. Thanks. > > > +#define VPN_MASK ((unsigned long)((1 << VPN_BITS) - > > 1)) > NIT: Use 1UL and you can avoid the cast. Thanks. I'll update that in the next version of patch series. > > > + > > +#ifdef CONFIG_RISCV_64 > > +/* L3 index Bit[47:39] */ > > +#define THIRD_SHIFT (39) > > +#define THIRD_MASK (VPN_MASK << THIRD_SHIFT) > > +/* L2 index Bit[38:30] */ > > +#define SECOND_SHIFT (30) > > +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT) > > +/* L1 index Bit[29:21] */ > > +#define FIRST_SHIFT (21) > > +#define FIRST_MASK (VPN_MASK << FIRST_SHIFT) > > +/* L0 index Bit[20:12] */ > > +#define ZEROETH_SHIFT (12) > > +#define ZEROETH_MASK (VPN_MASK << ZEROETH_SHIFT) > > On Arm, we are trying to phase out ZEROETH_* and co because the name > is > too generic. Instead, we now introduce a generic macro that take a > level > and then compute the mask/shift (see XEN_PT_LEVEL_*). > > You should be able to do in RISC-V and reduce the amount of defines > introduced. Thanks. I'll look at XEN_PT_LEVEL_*. I'll re-read Andrew's comment but as far as I understand after quick reading we can remove mostly that. > > > + > > +#else // CONFIG_RISCV_32 > > Coding style: comments in Xen are using /* ... */ > > > + > > +/* L1 index Bit[31:22] */ > > +#define FIRST_SHIFT (22) > > +#define FIRST_MASK (VPN_MASK << FIRST_SHIFT) > > + > > +/* L0 index Bit[21:12] */ > > +#define ZEROETH_SHIFT (12) > > +#define ZEROETH_MASK (VPN_MASK << ZEROETH_SHIFT) > > +#endif > > + > > +#define THIRD_SIZE (1 << THIRD_SHIFT) > > +#define THIRD_MAP_MASK (~(THIRD_SIZE - 1)) > > +#define SECOND_SIZE (1 << SECOND_SHIFT) > > +#define SECOND_MAP_MASK (~(SECOND_SIZE - 1)) > > +#define FIRST_SIZE (1 << FIRST_SHIFT) > > +#define FIRST_MAP_MASK (~(FIRST_SIZE - 1)) > > +#define ZEROETH_SIZE (1 << ZEROETH_SHIFT) > > +#define ZEROETH_MAP_MASK (~(ZEROETH_SIZE - 1)) > > + > > +#define PTE_SHIFT 10 > > + > > +#define PTE_VALID BIT(0, UL) > > +#define PTE_READABLE BIT(1, UL) > > +#define PTE_WRITABLE BIT(2, UL) > > +#define PTE_EXECUTABLE BIT(3, UL) > > +#define PTE_USER BIT(4, UL) > > +#define PTE_GLOBAL BIT(5, UL) > > +#define PTE_ACCESSED BIT(6, UL) > > +#define PTE_DIRTY BIT(7, UL) > > +#define PTE_RSW (BIT(8, UL) | BIT(9, UL)) > > + > > +#define PTE_LEAF_DEFAULT (PTE_VALID | PTE_READABLE | > > PTE_WRITABLE | PTE_EXECUTABLE) > > We should avoid vulnerable default flags. So this should either be RW > or RX. Thanks. I'll take it into account. > > > +#define PTE_TABLE (PTE_VALID) > > + > > +/* Calculate the offsets into the pagetables for a given VA */ > > +#define zeroeth_linear_offset(va) ((va) >> ZEROETH_SHIFT) > > +#define first_linear_offset(va) ((va) >> FIRST_SHIFT) > > +#define second_linear_offset(va) ((va) >> SECOND_SHIFT) > > +#define third_linear_offset(va) ((va) >> THIRD_SHIFT) > > + > > +#define pagetable_zeroeth_index(va) zeroeth_linear_offset((va) & > > ZEROETH_MASK) > > +#define pagetable_first_index(va) first_linear_offset((va) & > > FIRST_MASK) > > +#define pagetable_second_index(va) second_linear_offset((va) & > > SECOND_MASK) > > +#define pagetable_third_index(va) third_linear_offset((va) & > > THIRD_MASK) > > + > > +/* Page Table entry */ > > +typedef struct { > > + uint64_t pte; > > +} pte_t; > > + > > +/* Shift the VPN[x] or PPN[x] fields of a virtual or physical > > address > > + * to become the shifted PPN[x] fields of a page table entry */ > > +#define addr_to_ppn(x) (((x) >> PAGE_SHIFT) << PTE_SHIFT) > > + > > +static inline pte_t paddr_to_pte(unsigned long paddr) > > +{ > > + return (pte_t) { .pte = addr_to_ppn(paddr) }; > > +} > > + > > +static inline bool pte_is_valid(pte_t *p) > > +{ > > + return p->pte & PTE_VALID; > > +} > > + > > +#endif /* _ASM_RISCV_PAGE_H */ > > diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c > > new file mode 100644 > > index 0000000000..6e172376eb > > --- /dev/null > > +++ b/xen/arch/riscv/mm.c > > @@ -0,0 +1,223 @@ > > +#include <xen/init.h> > > +#include <xen/lib.h> > > + > > +#include <asm/csr.h> > > +#include <asm/mm.h> > > +#include <asm/page.h> > > + > > +/* > > + * xen_second_pagetable is indexed with the VPN[2] page table > > entry field > > + * xen_first_pagetable is accessed from the VPN[1] page table > > entry field > > + * xen_zeroeth_pagetable is accessed from the VPN[0] page table > > entry field > > + */ > > +pte_t xen_second_pagetable[PAGE_ENTRIES] > > __attribute__((__aligned__(PAGE_SIZE))); > > +static pte_t xen_first_pagetable[PAGE_ENTRIES] > > + __attribute__((__aligned__(PAGE_SIZE))); > > +static pte_t xen_zeroeth_pagetable[PAGE_ENTRIES] > > + __attribute__((__aligned__(PAGE_SIZE))); > > + > > +extern unsigned long _stext; > > +extern unsigned long _etext; > > +extern unsigned long __init_begin; > > +extern unsigned long __init_end; > > +extern unsigned long _srodata; > > +extern unsigned long _erodata; > > + > > +paddr_t phys_offset; > > This is defined, set but not used. > > > + > > +#define resolve_early_addr(x) \ > > This helper seems to behave the same wasy as linker_addr(). So any > reason to not use it? linker_addr() script can be used instead. It looks I missed something before and it is spilled out into two equal macros. > > I will make this assumption this can be used and not comment on the > implement of resolve_early_addr(). > > > + > > ({ > > \ > > + unsigned long * > > __##x; \ > > + if ( load_addr_start <= x && x < load_addr_end > > ) \ > > + __##x = (unsigned long > > *)x; \ > > + > > else > > \ > > + __##x = (unsigned long *)(x + load_addr_start - > > linker_addr_start); \ > > + > > __##x; > > \ > > + }) > > + > > +static void __init clear_pagetables(unsigned long load_addr_start, > > + unsigned long load_addr_end, > > + unsigned long linker_addr_start, > > + unsigned long linker_addr_end) > > +{ > > + unsigned long *p; > > + unsigned long page; > > + unsigned long i; > > + > > + page = (unsigned long)&xen_second_pagetable[0]; > > + > > + p = resolve_early_addr(page); > > + for ( i = 0; i < ARRAY_SIZE(xen_second_pagetable); i++ ) > > The entries in xen_second_pagetable are a pte_t (uint64_t). But ... > > > + { > > + p[i] = 0ULL; > > ... the type here will be unsigned long. So you may not fully zero > the > page-table on 32-bit architecture. Therefore you want to define as > pte_t. > > That said, given the page table will be part of BSS, you should not > need > to zero again assuming you clear BSS before hand. > > If you clear afterwards, then you *must* move them out of BSS. > > The same applies for xen_{first, zeroeth}_pagetable below. I didn't have initialized page tables so that is why I needed clear_pagetables() but I think you are right and clear_pagetables can be removed at all as page tables will be initialized during BSS initialization. > > > + } > > + > > + page = (unsigned long)&xen_first_pagetable[0]; > > + p = resolve_early_addr(page); > > + for ( i = 0; i < ARRAY_SIZE(xen_first_pagetable); i++ ) > > + { > > + p[i] = 0ULL; > > + } > > + > > + page = (unsigned long)&xen_zeroeth_pagetable[0]; > > + p = resolve_early_addr(page); > > + for ( i = 0; i < ARRAY_SIZE(xen_zeroeth_pagetable); i++ ) > > + { > > + p[i] = 0ULL; > > + } > > +} > > + > > +/* > > + * WARNING: load_addr() and linker_addr() are to be called only > > when the MMU is > > + * disabled and only when executed by the primary CPU. They > > cannot refer to > > + * any global variable or functions. > > I find interesting you are saying when _setup_initial_pagetables() is > called from setup_initial_pagetables(). Would you be able to explain > how > this is different? I am not sure that I understand your question correctly but _setup_initial_pagetables() was introduced to map some addresses with write/read flag. Probably I have to rename it to something that is more clear. > > > + */ > > + > > +/* > > + * Convert an addressed layed out at link time to the address > > where it was loaded > > Typo: s/addressed/address/ ? Yes, it should be address. and 'layed out' should be changed to 'laid out'... > > > + * by the bootloader. > > + */ > > Looking at the implementation, you seem to consider that any address > not > in the range [linker_addr_start, linker_addr_end[ will have a 1:1 > mappings. > > I am not sure this is what you want. So I would consider to throw an > error if such address is passed. I thought that at this stage and if no relocation was done it is 1:1 except the case when load_addr_start != linker_addr_start. > > > +#define > > load_addr(linker_address) > > \ > > + > > ({ > > \ > > + unsigned long __linker_address = (unsigned > > long)(linker_address); \ > > + if ( linker_addr_start <= __linker_address > > && \ > > + __linker_address < linker_addr_end > > ) \ > > + > > { > > \ > > + __linker_address > > = \ > > + __linker_address - linker_addr_start + > > load_addr_start; \ > > + > > } > > \ > > + > > __linker_address; > > \ > > + }) > > + > > +/* Convert boot-time Xen address from where it was loaded by the > > boot loader to the address it was layed out > > + * at link-time. > > + */ > > Coding style: The first line is too long and multi-line comments look > like: > > /* > * Foo > * Bar > */ > > > +#define > > linker_addr(load_address) > > \ > > Same remark as for load_addr() above. > > > + > > ({ > > \ > > + unsigned long __load_address = (unsigned > > long)(load_address); \ > > + if ( load_addr_start <= __load_address > > && \ > > + __load_address < load_addr_end > > ) \ > > + > > { > > \ > > + __load_address > > = \ > > + __load_address - load_addr_start + > > linker_addr_start; \ > > + > > } > > \ > > + > > __load_address; > > \ > > + }) > > + > > +static void __attribute__((section(".entry"))) > > +_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t > > *zeroeth, > Can this be named to setup_initial_mapping() so this is clearer and > avoid the one '_' different with the function below. Sure. It will be better. > > > + unsigned long map_start, > > + unsigned long map_end, > > + unsigned long pa_start, > > + bool writable) > > What about the executable bit? It's always executable... But as you mentioned above PTE_LEAF_DEFAULT should be either RX or RW. I think it makes sense to add flags instead of writable. > > > +{ > > + unsigned long page_addr; > > + unsigned long index2; > > + unsigned long index1; > > + unsigned long index0; > > index* could be defined in the loop below. It could. But I am curious why it is better? > > > + > > + /* align start addresses */ > > + map_start &= ZEROETH_MAP_MASK; > > + pa_start &= ZEROETH_MAP_MASK; > > Hmmm... I would actually expect the address to be properly aligned > and > therefore not require an alignment here. > > Otherwise, this raise the question of what happen if you have region > using the same page? That map_start &= ZEROETH_MAP_MASK is needed to page number of address w/o page offset. > > > + > > + page_addr = map_start; > > + while ( page_addr < map_end ) > > Looking at the loop, it looks like you are assuming that the region > will > never cross a boundary of a page-table (either L0, L1, L2). I am not > convinced you can make such assumption (see below). > > But if you really want to make such assumption then you should add > some > guard (either BUILD_BUG_ON(), ASSERT(), proper check) in your code to > avoid any surprise in the future. I am not sure that I fully understand what is the problem here. The address is aligned on (1<<12) boundary and each itearation is mapped (1<<12) page so all looks fine or I misunderstood you. > > > + { > > + index2 = pagetable_second_index(page_addr); > > + index1 = pagetable_first_index(page_addr); > > + index0 = pagetable_zeroeth_index(page_addr); > > + > > + /* Setup level2 table */ > > + second[index2] = paddr_to_pte((unsigned long)first); > > + second[index2].pte |= PTE_TABLE; > > + > > + /* Setup level1 table */ > > + first[index1] = paddr_to_pte((unsigned long)zeroeth); > > + first[index1].pte |= PTE_TABLE; > > + > > + /* Setup level0 table */ > > + if ( !pte_is_valid(&zeroeth[index0]) ) > > Can you explain why you are checking !pte_is_valid() for the L0 entry > but not the other? My mistake it should be checked for each level. > > > + { > > + /* Update level0 table */ > > + zeroeth[index0] = paddr_to_pte((page_addr - map_start) > > + pa_start); > > + zeroeth[index0].pte |= PTE_LEAF_DEFAULT; > > + zeroeth[index0].pte &= ~((!writable) ? PTE_WRITABLE : > > 0); > > Looking at the default value, it would mean that a non-writable > mapping > is automatically executable. This seems wrong for the section is not > meant to be executable (like rodata). Yes, you are right. I'll reowrk setup_initial_mapping() to pass flags instead of write/read - only flag. > > > + } > > + > > + /* Point to next page */ > > + page_addr += ZEROETH_SIZE; > > + } > > +} > > + > > +/* > > + * setup_initial_pagetables: > > + * > > + * 1) Build the page tables for Xen that map the following: > > + * 1.1) The physical location of Xen (where the bootloader > > loaded it) > > + * 1.2) The link-time location of Xen (where the linker > > expected Xen's > > + * addresses to be) > > + * 2) Load the page table into the SATP and enable the MMU > > + */ > > +void __attribute__((section(".entry"))) > > I couldn't find a section ".entry" in the linker. > > > +setup_initial_pagetables(unsigned long load_addr_start, > > + unsigned long load_addr_end, > > + unsigned long linker_addr_start, > > + unsigned long linker_addr_end) > > +{ > > + pte_t *second; > > + pte_t *first; > > + pte_t *zeroeth; > > + > > + clear_pagetables(load_addr_start, load_addr_end, > > + linker_addr_start, linker_addr_end); > > + > > + /* Get the addresses where the page tables were loaded */ > > + second = (pte_t *)load_addr(&xen_second_pagetable); > > + first = (pte_t *)load_addr(&xen_first_pagetable); > > + zeroeth = (pte_t *)load_addr(&xen_zeroeth_pagetable); > > I would consider to embed the type cast in load_addr() so you are > adding > some type safety within your code. Definitely it will be better but setup_initial_mapping() uses 'unsigned long' for passing address that should be mapped. > > > + > > + /* > > + * Create a mapping from Xen's link-time addresses to where > > they were actually loaded. > > This is line is way long than 80 characters. Please make sure to wrap > it > 80 characters. > > > + */ > > + _setup_initial_pagetables(second, first, zeroeth, > > + linker_addr(&_stext), > > + linker_addr(&_etext), > > + load_addr(&_stext), > > + false); > > + _setup_initial_pagetables(second, first, zeroeth, > > + linker_addr(&__init_begin), > > + linker_addr(&__init_end), > > + load_addr(&__init_begin), > > + true); > > + _setup_initial_pagetables(second, first, zeroeth, > > + linker_addr(&_srodata), > > + linker_addr(&_erodata), > > + load_addr(&_srodata), > > + false); > > + _setup_initial_pagetables(second, first, zeroeth, > > + linker_addr_start, > > + linker_addr_end, > > + load_addr_start, > > + true); > > Where do you guarantee that Xen will always fit in an L0 table and > the > start address is aligned to the size of an L0 table? I don't guarantee that it fit in an L0 table but the start address is aligned to the size of the L0 table at the start. > > > + > > + /* > > + * Create a mapping of the load time address range to... the > > load time address range. > > Same about the line length here. > > > + * This mapping is used at boot time only. > > + */ > > + _setup_initial_pagetables(second, first, zeroeth, > > This can only work if Xen is loaded at its linked address. So you > need a > separate set of L0, L1 tables for the identity mapping. > > That said, this would not be sufficient because: > 1) Xen may not be loaded at a 2M boundary (you can control with > U-boot, but not with EFI). So this may cross a boundary and therefore > need multiple pages. > 2) The load region may overlap the link address > > While I think it would be good to handle those cases from the start, > I > would understand why are not easy to solve. So I think the minimum is > to > throw some errors if you are in a case you can't support. Do you mean to throw some error in load_addr()/linkder_addr()? > > > + load_addr_start, > > + load_addr_end, > > + load_addr_start, > > + true); > + > > + /* Ensure page table writes precede loading the SATP */ > > + asm volatile("sfence.vma"); > > + > > + /* Enable the MMU and load the new pagetable for Xen */ > > + csr_write(CSR_SATP, > > + (load_addr(xen_second_pagetable) >> PAGE_SHIFT) | > > SATP_MODE_SV39 << SATP_MODE_SHIFT); > > IHMO, it would make sense to introduce within the series the code to > jump off the identity mapping and then remove it. Probably I have to spend more time to understand identity mapping but it looks like current one RISC-V port from Bobby's doesn't jump off and remove the identity mapping. > > > + > > + phys_offset = load_addr_start - linker_addr_start; > > + > > + return; > > +} > > Cheers, > ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |