[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()
On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote: > On 23.10.2024 17:50, Oleksii Kurochko wrote: > > Introduce the implementation of setup_mm(), which includes: > > 1. Adding all free regions to the boot allocator, as memory is > > needed > > to allocate page tables used for frame table mapping. > > 2. Calculating RAM size and the RAM end address. > > 3. Setting up direct map mappings from each RAM bank and initialize > > directmap_virt_start (also introduce XENHEAP_VIRT_START which is > > defined as directmap_virt_start) to be properly aligned with RAM > > start to use more superpages to reduce pressure on the TLB. > > 4. Setting up frame table mappings from physical address 0 to > > ram_end > > to simplify mfn_to_page() and page_to_mfn() conversions. > > 5. Setting up total_pages and max_page. > > > > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START. > > > > Implement maddr_to_virt() function to convert a machine address > > to a virtual address. This function is specifically designed to be > > used > > only for the DIRECTMAP region, so a check has been added to ensure > > that > > the address does not exceed DIRECTMAP_SIZE. > > I'm unconvinced by this. Conceivably the function could be used on > "imaginary" addresses, just to calculate abstract positions or e.g. > deltas. At the same time I'm also not going to insist on the removal > of > that assertion, so long as it doesn't trigger. > > > After the introduction of maddr_to_virt() the following linkage > > error starts > > to occur and to avoid it share_xen_page_with_guest() stub is added: > > riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill': > > /build/xen/common/tasklet.c:176: undefined reference to > > `share_xen_page_with_guest' > > riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol > > `share_xen_page_with_guest' > > isn't defined riscv64-linux-gnu-ld: final link failed: bad > > value > > > > Despite the linkger fingering tasklet.c, it's trace.o which has the > > undefined > > refenrece: > > $ find . -name \*.o | while read F; do nm $F | grep > > share_xen_page_with_guest && > > echo $F; done > > U share_xen_page_with_guest > > ./xen/common/built_in.o > > U share_xen_page_with_guest > > ./xen/common/trace.o > > U share_xen_page_with_guest > > ./xen/prelink.o > > > > Looking at trace.i, there is call of share_xen_page_with_guest() > > but in case of > > when maddr_to_virt() is defined as "return NULL" compiler optimizes > > the part of > > common/trace.c code where share_xen_page_with_priviliged_guest() is > > called > > ( there is no any code in dissambled common/trace.o ) so there is > > no real call > > of share_xen_page_with_priviliged_guest(). > > I don't think it's the "return NULL", but rather BUG_ON()'s (really > BUG()'s) > unreachable(). Not the least because the function can't validly > return NULL, > and hence callers have no need to check for NULL. > > > @@ -25,8 +27,11 @@ > > > > static inline void *maddr_to_virt(paddr_t ma) > > { > > - BUG_ON("unimplemented"); > > - return NULL; > > + unsigned long va_offset = maddr_to_directmapoff(ma); > > + > > + ASSERT(va_offset < DIRECTMAP_SIZE); > > + > > + return (void *)(XENHEAP_VIRT_START + va_offset); > > } > > I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when > it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START > in the first place: You don't have a separate "heap" virtual address > range, do you? The name may not be ideal for RISC-V. I borrowed it from Arm, intending to account for cases where the directmap virtual start might not align with DIRECTMAP_VIRT_START due to potential adjustments for superpage mapping. And my understanding is that XENHEAP == DIRECTMAP in case of Arm64. Let's discuss below whether XENHEAP_VIRT_START is necessary, as there are related questions connected to it. > > > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma) > > */ > > static inline unsigned long virt_to_maddr(unsigned long va) > > { > > - if ((va >= DIRECTMAP_VIRT_START) && > > + if ((va >= XENHEAP_VIRT_START) && > > (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) > > - return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START); > > + return directmapoff_to_maddr(va - XENHEAP_VIRT_START); > > Same concern here then. > > > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t > > fdt_paddr) > > > > return fdt_virt; > > } > > + > > +#ifndef CONFIG_RISCV_32 > > I'd like to ask that you be more selective with this #ifdef (or omit > it > altogether here). setup_mm() itself, for example, looks good for any > mode. Regarding setup_mm() as they have pretty different implementations for 32 and 64 bit versions. > Like does ... > > > +#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1)) > > ... this #define. Then again this macro may better be placed in > xen/macros.h anyway, next to ROUNDUP(). I will put it there. It was put in arch specific code as for such long existence of Xen project no one introduce that so I decided that it is only one specific case thereby no real need to go to common. > > > + frametable_size = ROUNDUP(frametable_size, MB(2)); > > + base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, > > PFN_DOWN(MB(2))); > > The 2Mb aspect wants a (brief) comment, imo. > > > + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, > > + PFN_DOWN(frametable_size), > > + PAGE_HYPERVISOR_RW) ) > > + panic("Unable to setup the frametable mappings\n"); > > + > > + memset(&frame_table[0], 0, nr_mfns * sizeof(struct > > page_info)); > > + memset(&frame_table[nr_mfns], -1, > > + frametable_size - (nr_mfns * sizeof(struct > > page_info))); > > Here (see comments on v1) you're still assuming ps == 0. Do you refer to ? ``` > +/* Map a frame table to cover physical addresses ps through pe */ > +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) > +{ > + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) - This looks to be accounting for a partial page at the end. > + mfn_x(maddr_to_mfn(ps)) + 1; Whereas this doesn't do the same at the start. The sole present caller passes 0, so that's going to be fine for the time being. Yet it's a latent pitfall. I'd recommend to either drop the function parameter, or to deal with it correctly right away. ``` And I've added aligned_ps to cover the case that ps could be not page aligned. Or are you refering to 0 in memset(&frame_table[0],...)? > > > +/* Map the region in the directmap area. */ > > +static void __init setup_directmap_mappings(unsigned long > > base_mfn, > > + unsigned long nr_mfns) > > +{ > > + int rc; > > + > > + /* First call sets the directmap physical and virtual offset. > > */ > > + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) > > + { > > + directmap_mfn_start = _mfn(base_mfn); > > + > > + /* > > + * The base address may not be aligned to the second level > > + * size (e.g. 1GB when using 4KB pages). This would > > prevent > > + * superpage mappings for all the regions because the > > virtual > > + * address and machine address should both be suitably > > aligned. > > + * > > + * Prevent that by offsetting the start of the directmap > > virtual > > + * address. > > + */ > > + directmap_virt_start = DIRECTMAP_VIRT_START + > > pfn_to_paddr(base_mfn); > > Don't you need to mask off top bits of the incoming MFN here, or else > you > may waste a huge part of direct map space? Yes, it will result in a loss of direct map space, but we still have a considerable amount available in Sv39 mode and higher modes. The largest RAM_START I see currently is 0x1000000000, which means we would lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still plenty of free space available, and we can always increase DIRECTMAP_SIZE since we have a lot of free virtual address space in Sv39. That said, I’m not insisting on this approach. My suggestion was to handle the addition and subtraction of directmap_mfn_start in maddr_to_virt() and virt_to_maddr(): ``` +extern mfn_t directmap_mfn_start; extern vaddr_t directmap_virt_start; #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT) @@ -31,7 +32,7 @@ static inline void *maddr_to_virt(paddr_t ma) ASSERT(va_offset < DIRECTMAP_SIZE); - return (void *)(XENHEAP_VIRT_START + va_offset); + return (void *)(XENHEAP_VIRT_START - (mfn_to_maddr(directmap_mfn_start)) + va_offset); } /* @@ -44,7 +45,7 @@ static inline unsigned long virt_to_maddr(unsigned long va) { if ((va >= XENHEAP_VIRT_START) && (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE))) - return directmapoff_to_maddr(va - XENHEAP_VIRT_START); + return directmapoff_to_maddr(va - XENHEAP_VIRT_START + mfn_to_maddr(directmap_mfn_start)); BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2)); ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) == diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index 262cec811e..7ef9db2363 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -450,7 +450,7 @@ static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) } -static mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER; +mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER; vaddr_t __ro_after_init directmap_virt_start; /* Map the region in the directmap area. */ @@ -462,6 +462,8 @@ static void __init setup_directmap_mappings(unsigned long base_mfn, /* First call sets the directmap physical and virtual offset. */ if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) { + unsigned long mfn_gb = base_mfn & ~XEN_PT_LEVEL_SIZE(2); + directmap_mfn_start = _mfn(base_mfn); /* @@ -473,7 +475,8 @@ static void __init setup_directmap_mappings(unsigned long base_mfn, * Prevent that by offsetting the start of the directmap virtual * address. */ - directmap_virt_start = DIRECTMAP_VIRT_START + pfn_to_paddr(base_mfn); + directmap_virt_start = DIRECTMAP_VIRT_START + + (base_mfn - mfn_gb) * PAGE_SIZE; /*+ pfn_to_paddr(base_mfn)*/; ``` Finally, regarding masking off the top bits of mfn, I'm not entirely clear on how this should work. If I understand correctly, if I mask off certain top bits in mfn, then I would need to unmask those same top bits in maddr_to_virt() and virt_to_maddr(). Is that correct? Another point I’m unclear on is which specific part of the top bits should be masked. If you could explain this to me, I would really appreciate it, and I'll be happy to use the masking approach. > > > +} > > + > > +/* > > + * Setup memory management > > + * > > + * RISC-V 64 has a large virtual address space (the minimum > > supported > > + * MMU mode is Sv39, which provides TBs of VA space). > > Is it really TBs? According to my math you'd need more than 40 bits > to > map a single Tb (alongside other stuff). I accidentally calculated it as the first 40 bits (from bits 0 to 39) due to the "39" in Sv39. However, in reality, it’s actually 39 bits (from bits 0 to 38), so it represents less than TBs, only GBs of virtual address space. > > > + */ > > +void __init setup_mm(void) > > +{ > > + const struct membanks *banks = bootinfo_get_mem(); > > + paddr_t ram_start = INVALID_PADDR; > > + paddr_t ram_end = 0; > > + paddr_t ram_size = 0; > > + unsigned int i; > > + > > + /* > > + * We need some memory to allocate the page-tables used for > > the directmap > > + * mappings. But some regions may contain memory already > > allocated > > + * for other uses (e.g. modules, reserved-memory...). > > + * > > + * For simplicity, add all the free regions in the boot > > allocator. > > + */ > > + populate_boot_allocator(); > > + > > + total_pages = 0; > > + > > + for ( i = 0; i < banks->nr_banks; i++ ) > > + { > > + const struct membank *bank = &banks->bank[i]; > > + paddr_t bank_end = bank->start + bank->size; > > + > > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > As before - if a bank doesn't cover full pages, this may give the > impression > of there being more "total pages" than there are. Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total size of a bank is 11K, ram_size will end up being 8K, so the "total pages" will cover less RAM than the actual size of the RAM bank. > > > + ram_start = min(ram_start, bank->start); > > + ram_end = max(ram_end, bank_end); > > + > > + setup_directmap_mappings(PFN_DOWN(bank->start), > > + PFN_DOWN(bank->size)); > > Similarly I don't think this is right when both start and size aren't > multiple of PAGE_SIZE. You may map an unsuable partial page at the > start, > and then fail to map a fully usable page at the end. ram_size should be a multiple of PAGE_SIZE because we have: ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); Do you know of any examples where bank->start isn't aligned to PAGE_SIZE? Should be somewhere mentioned what is legal physical address for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part of the bank. ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |