[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()
On 30.10.2024 17:50, oleksii.kurochko@xxxxxxxxx wrote: > On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote: >> On 23.10.2024 17:50, Oleksii Kurochko wrote: >>> @@ -25,8 +27,11 @@ >>> >>> static inline void *maddr_to_virt(paddr_t ma) >>> { >>> - BUG_ON("unimplemented"); >>> - return NULL; >>> + unsigned long va_offset = maddr_to_directmapoff(ma); >>> + >>> + ASSERT(va_offset < DIRECTMAP_SIZE); >>> + >>> + return (void *)(XENHEAP_VIRT_START + va_offset); >>> } >> >> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when >> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START >> in the first place: You don't have a separate "heap" virtual address >> range, do you? > The name may not be ideal for RISC-V. I borrowed it from Arm, intending > to account for cases where the directmap virtual start might not align > with DIRECTMAP_VIRT_START due to potential adjustments for superpage > mapping. > And my understanding is that XENHEAP == DIRECTMAP in case of Arm64. Just to mention it: If I looked at Arm64 in isolation (without also considering Arm32, and hence the desire to keep code common where possible), I'd consider the mere existence of XENHEAP_VIRT_START (without an accompanying XENHEAP_VIRT_SIZE) a mistake. Therefore for RISC-V its introduction may be justified by (remote) plans to also cover RV32 at some point. Yet such than needs sayin explicitly in the description. >>> @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t >>> fdt_paddr) >>> >>> return fdt_virt; >>> } >>> + >>> +#ifndef CONFIG_RISCV_32 >> >> I'd like to ask that you be more selective with this #ifdef (or omit >> it >> altogether here). setup_mm() itself, for example, looks good for any >> mode. > Regarding setup_mm() as they have pretty different implementations for > 32 and 64 bit versions. Not setup_mm() itself, it seems. Its helpers - sure. >>> + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, >>> + PFN_DOWN(frametable_size), >>> + PAGE_HYPERVISOR_RW) ) >>> + panic("Unable to setup the frametable mappings\n"); >>> + >>> + memset(&frame_table[0], 0, nr_mfns * sizeof(struct >>> page_info)); >>> + memset(&frame_table[nr_mfns], -1, >>> + frametable_size - (nr_mfns * sizeof(struct >>> page_info))); >> >> Here (see comments on v1) you're still assuming ps == 0. > Do you refer to ? > ``` >> +/* Map a frame table to cover physical addresses ps through pe */ >> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) >> +{ >> + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) - > > This looks to be accounting for a partial page at the end. > >> + mfn_x(maddr_to_mfn(ps)) + 1; > > Whereas this doesn't do the same at the start. The sole present caller > passes 0, so that's going to be fine for the time being. Yet it's a > latent pitfall. I'd recommend to either drop the function parameter, or > to deal with it correctly right away. > ``` > And I've added aligned_ps to cover the case that ps could be not page > aligned. Not this, no, but ... > Or are you refering to 0 in memset(&frame_table[0],...)? ... this. If the start address wasn't 0, you'd need to invalidate a region at the start of the table, just as you invalidate a region at the end. >>> +/* Map the region in the directmap area. */ >>> +static void __init setup_directmap_mappings(unsigned long >>> base_mfn, >>> + unsigned long nr_mfns) >>> +{ >>> + int rc; >>> + >>> + /* First call sets the directmap physical and virtual offset. >>> */ >>> + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) ) >>> + { >>> + directmap_mfn_start = _mfn(base_mfn); >>> + >>> + /* >>> + * The base address may not be aligned to the second level >>> + * size (e.g. 1GB when using 4KB pages). This would >>> prevent >>> + * superpage mappings for all the regions because the >>> virtual >>> + * address and machine address should both be suitably >>> aligned. >>> + * >>> + * Prevent that by offsetting the start of the directmap >>> virtual >>> + * address. >>> + */ >>> + directmap_virt_start = DIRECTMAP_VIRT_START + >>> pfn_to_paddr(base_mfn); >> >> Don't you need to mask off top bits of the incoming MFN here, or else >> you >> may waste a huge part of direct map space? > Yes, it will result in a loss of direct map space, but we still have a > considerable amount available in Sv39 mode and higher modes. The > largest RAM_START I see currently is 0x1000000000, which means we would > lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still > plenty of free space available, and we can always increase > DIRECTMAP_SIZE since we have a lot of free virtual address space in > Sv39. Wow, 68 out of 308 - that's more than 20%. I'm definitely concerned of this then. > Finally, regarding masking off the top bits of mfn, I'm not entirely > clear on how this should work. If I understand correctly, if I mask off > certain top bits in mfn, then I would need to unmask those same top > bits in maddr_to_virt() and virt_to_maddr(). Is that correct? > > Another point I’m unclear on is which specific part of the top bits > should be masked. You want to "move" the directmap such that the first legitimate RAM page is within the first (large/huge) page mapping of the directmap. IOW the "virtual" start of the directmap would move down in VA space. That still leaves things at a simple offset calculation when translating VA <-> PA. To give an example: Let's assume RAM starts at 61.5 Gb, and you want to use 1Gb mappings for the bulk of the directmap. Then the "virtual" start of the directmap would shift down to DIRECTMAP_VIRT_START - 60Gb, such that the first RAM page would be mapped at DIRECTMAP_VIRT_START + 1.5Gb. IOW it would be the low 30 address bits of the start address that you use (30 - PAGE_SHIFT for the MFN), with the higher bits contributing to the offset involved in the VA <-> PA translation. Values used depend on the (largest) page size you mean to use for the direct map: On systems with terabytes of memory (demanding Sv48 or even Sv57 mode) you may want to use 512Gb mappings, and hence you'd then need to mask the low 39 bits (or 48 for 256Tb mappings). >>> +void __init setup_mm(void) >>> +{ >>> + const struct membanks *banks = bootinfo_get_mem(); >>> + paddr_t ram_start = INVALID_PADDR; >>> + paddr_t ram_end = 0; >>> + paddr_t ram_size = 0; >>> + unsigned int i; >>> + >>> + /* >>> + * We need some memory to allocate the page-tables used for >>> the directmap >>> + * mappings. But some regions may contain memory already >>> allocated >>> + * for other uses (e.g. modules, reserved-memory...). >>> + * >>> + * For simplicity, add all the free regions in the boot >>> allocator. >>> + */ >>> + populate_boot_allocator(); >>> + >>> + total_pages = 0; >>> + >>> + for ( i = 0; i < banks->nr_banks; i++ ) >>> + { >>> + const struct membank *bank = &banks->bank[i]; >>> + paddr_t bank_end = bank->start + bank->size; >>> + >>> + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); >> >> As before - if a bank doesn't cover full pages, this may give the >> impression >> of there being more "total pages" than there are. > Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total > size of a bank is 11K, ram_size will end up being 8K, so the "total > pages" will cover less RAM than the actual size of the RAM bank. ram_start at 2k but bank size being 13k would yield 2 usable pages (first partial page of 2k unusable and last partial page of 3k unusable), yet ram_size of 12k (3 pages). You need to consider the common case; checking things work for a randomly chosen example isn't enough. >>> + ram_start = min(ram_start, bank->start); >>> + ram_end = max(ram_end, bank_end); >>> + >>> + setup_directmap_mappings(PFN_DOWN(bank->start), >>> + PFN_DOWN(bank->size)); >> >> Similarly I don't think this is right when both start and size aren't >> multiple of PAGE_SIZE. You may map an unsuable partial page at the >> start, >> and then fail to map a fully usable page at the end. > ram_size should be a multiple of PAGE_SIZE because we have: > ram_size += ROUNDDOWN(bank->size, PAGE_SIZE); > > Do you know of any examples where bank->start isn't aligned to > PAGE_SIZE? Question is the other way around: Is it specified anywhere that start (and size) _need_ to be aligned? And if it is - do all firmware makers play by that (on x86 at least specifications often mean pretty little to firmware people, apparently)? > Should be somewhere mentioned what is legal physical address > for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no > choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part > of the bank. Correct - partial pages simply cannot be used (except in adhoc ways, which likely isn't very desirable). Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |