[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()



On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote:
> On 23.10.2024 17:50, Oleksii Kurochko wrote:
> > Introduce the implementation of setup_mm(), which includes:
> > 1. Adding all free regions to the boot allocator, as memory is
> > needed
> >    to allocate page tables used for frame table mapping.
> > 2. Calculating RAM size and the RAM end address.
> > 3. Setting up direct map mappings from each RAM bank and initialize
> >    directmap_virt_start (also introduce XENHEAP_VIRT_START which is
> >    defined as directmap_virt_start) to be properly aligned with RAM
> >    start to use more superpages to reduce pressure on the TLB.
> > 4. Setting up frame table mappings from physical address 0 to
> > ram_end
> >    to simplify mfn_to_page() and page_to_mfn() conversions.
> > 5. Setting up total_pages and max_page.
> > 
> > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START.
> > 
> > Implement maddr_to_virt() function to convert a machine address
> > to a virtual address. This function is specifically designed to be
> > used
> > only for the DIRECTMAP region, so a check has been added to ensure
> > that
> > the address does not exceed DIRECTMAP_SIZE.
> 
> I'm unconvinced by this. Conceivably the function could be used on
> "imaginary" addresses, just to calculate abstract positions or e.g.
> deltas. At the same time I'm also not going to insist on the removal
> of
> that assertion, so long as it doesn't trigger.
> 
> > After the introduction of maddr_to_virt() the following linkage
> > error starts
> > to occur and to avoid it share_xen_page_with_guest() stub is added:
> >   riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill':
> >   /build/xen/common/tasklet.c:176: undefined reference to
> >      `share_xen_page_with_guest'
> >   riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
> > `share_xen_page_with_guest'
> >     isn't defined riscv64-linux-gnu-ld: final link failed: bad
> > value
> > 
> > Despite the linkger fingering tasklet.c, it's trace.o which has the
> > undefined
> > refenrece:
> >   $ find . -name \*.o | while read F; do nm $F | grep
> > share_xen_page_with_guest &&
> >     echo $F; done
> >                      U share_xen_page_with_guest
> >     ./xen/common/built_in.o
> >                      U share_xen_page_with_guest
> >     ./xen/common/trace.o
> >                      U share_xen_page_with_guest
> >     ./xen/prelink.o
> > 
> > Looking at trace.i, there is call of share_xen_page_with_guest()
> > but in case of
> > when maddr_to_virt() is defined as "return NULL" compiler optimizes
> > the part of
> > common/trace.c code where share_xen_page_with_priviliged_guest() is
> > called
> > ( there is no any code in dissambled common/trace.o ) so there is
> > no real call
> > of share_xen_page_with_priviliged_guest().
> 
> I don't think it's the "return NULL", but rather BUG_ON()'s (really
> BUG()'s)
> unreachable(). Not the least because the function can't validly
> return NULL,
> and hence callers have no need to check for NULL.
> 
> > @@ -25,8 +27,11 @@
> >  
> >  static inline void *maddr_to_virt(paddr_t ma)
> >  {
> > -    BUG_ON("unimplemented");
> > -    return NULL;
> > +    unsigned long va_offset = maddr_to_directmapoff(ma);
> > +
> > +    ASSERT(va_offset < DIRECTMAP_SIZE);
> > +
> > +    return (void *)(XENHEAP_VIRT_START + va_offset);
> >  }
> 
> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when
> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START
> in the first place: You don't have a separate "heap" virtual address
> range, do you?
The name may not be ideal for RISC-V. I borrowed it from Arm, intending
to account for cases where the directmap virtual start might not align
with DIRECTMAP_VIRT_START due to potential adjustments for superpage
mapping.
And my understanding is that XENHEAP == DIRECTMAP in case of Arm64.

Let's discuss below whether XENHEAP_VIRT_START is necessary, as there
are related questions connected to it.

> 
> > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma)
> >   */
> >  static inline unsigned long virt_to_maddr(unsigned long va)
> >  {
> > -    if ((va >= DIRECTMAP_VIRT_START) &&
> > +    if ((va >= XENHEAP_VIRT_START) &&
> >          (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
> > -        return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
> > +        return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
> 
> Same concern here then.
> 
> > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t
> > fdt_paddr)
> >  
> >      return fdt_virt;
> >  }
> > +
> > +#ifndef CONFIG_RISCV_32
> 
> I'd like to ask that you be more selective with this #ifdef (or omit
> it
> altogether here). setup_mm() itself, for example, looks good for any
> mode.
Regarding setup_mm() as they have pretty different implementations for
32 and 64 bit versions.

> Like does ...
> 
> > +#define ROUNDDOWN(addr, size)  ((addr) & ~((size) - 1))
> 
> ... this #define. Then again this macro may better be placed in
> xen/macros.h anyway, next to ROUNDUP().
I will put it there. It was put in arch specific code as for such long
existence of Xen project no one introduce that so I decided that it is
only one specific case thereby no real need to go to common.

> 
> > +    frametable_size = ROUNDUP(frametable_size, MB(2));
> > +    base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT,
> > PFN_DOWN(MB(2)));
> 
> The 2Mb aspect wants a (brief) comment, imo.
> 
> > +    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> > +                          PFN_DOWN(frametable_size),
> > +                          PAGE_HYPERVISOR_RW) )
> > +        panic("Unable to setup the frametable mappings\n");
> > +
> > +    memset(&frame_table[0], 0, nr_mfns * sizeof(struct
> > page_info));
> > +    memset(&frame_table[nr_mfns], -1,
> > +           frametable_size - (nr_mfns * sizeof(struct
> > page_info)));
> 
> Here (see comments on v1) you're still assuming ps == 0.
Do you refer to ?
```
> +/* Map a frame table to cover physical addresses ps through pe */
> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
> +{
> +    unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -

This looks to be accounting for a partial page at the end.

> +                            mfn_x(maddr_to_mfn(ps)) + 1;

Whereas this doesn't do the same at the start. The sole present caller
passes 0, so that's going to be fine for the time being. Yet it's a
latent pitfall. I'd recommend to either drop the function parameter, or
to deal with it correctly right away.
```
And I've added aligned_ps to cover the case that ps could be not page
aligned.

Or are you refering to 0 in memset(&frame_table[0],...)?

> 
> > +/* Map the region in the directmap area. */
> > +static void __init setup_directmap_mappings(unsigned long
> > base_mfn,
> > +                                            unsigned long nr_mfns)
> > +{
> > +    int rc;
> > +
> > +    /* First call sets the directmap physical and virtual offset.
> > */
> > +    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
> > +    {
> > +        directmap_mfn_start = _mfn(base_mfn);
> > +
> > +        /*
> > +         * The base address may not be aligned to the second level
> > +         * size (e.g. 1GB when using 4KB pages). This would
> > prevent
> > +         * superpage mappings for all the regions because the
> > virtual
> > +         * address and machine address should both be suitably
> > aligned.
> > +         *
> > +         * Prevent that by offsetting the start of the directmap
> > virtual
> > +         * address.
> > +         */
> > +        directmap_virt_start = DIRECTMAP_VIRT_START +
> > pfn_to_paddr(base_mfn);
> 
> Don't you need to mask off top bits of the incoming MFN here, or else
> you
> may waste a huge part of direct map space?
Yes, it will result in a loss of direct map space, but we still have a
considerable amount available in Sv39 mode and higher modes. The
largest RAM_START I see currently is 0x1000000000, which means we would
lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still
plenty of free space available, and we can always increase
DIRECTMAP_SIZE since we have a lot of free virtual address space in
Sv39.

That said, I’m not insisting on this approach.
My suggestion was to handle the addition and subtraction of
directmap_mfn_start in maddr_to_virt() and virt_to_maddr():
```
+extern mfn_t directmap_mfn_start;
 extern vaddr_t directmap_virt_start;
 
 #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
@@ -31,7 +32,7 @@ static inline void *maddr_to_virt(paddr_t ma)
 
     ASSERT(va_offset < DIRECTMAP_SIZE);
 
-    return (void *)(XENHEAP_VIRT_START + va_offset);
+    return (void *)(XENHEAP_VIRT_START -
(mfn_to_maddr(directmap_mfn_start)) + va_offset);
 }
 
 /*
@@ -44,7 +45,7 @@ static inline unsigned long virt_to_maddr(unsigned
long va)
 {
     if ((va >= XENHEAP_VIRT_START) &&
         (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
-        return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
+        return directmapoff_to_maddr(va - XENHEAP_VIRT_START +
mfn_to_maddr(directmap_mfn_start));
 
     BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2));
     ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) ==
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 262cec811e..7ef9db2363 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -450,7 +450,7 @@ static void __init
setup_frametable_mappings(paddr_t ps, paddr_t pe)
 }
 
-static mfn_t __ro_after_init directmap_mfn_start =
INVALID_MFN_INITIALIZER;
+mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER;
 vaddr_t __ro_after_init directmap_virt_start;
 
 /* Map the region in the directmap area. */
@@ -462,6 +462,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
     /* First call sets the directmap physical and virtual offset. */
     if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
     {
+        unsigned long mfn_gb = base_mfn & ~XEN_PT_LEVEL_SIZE(2);
+
         directmap_mfn_start = _mfn(base_mfn);
 
         /*
@@ -473,7 +475,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
          * Prevent that by offsetting the start of the directmap
virtual
          * address.
          */
-        directmap_virt_start = DIRECTMAP_VIRT_START +
pfn_to_paddr(base_mfn);
+        directmap_virt_start = DIRECTMAP_VIRT_START  +
+            (base_mfn - mfn_gb) * PAGE_SIZE; /*+
pfn_to_paddr(base_mfn)*/;
```

Finally, regarding masking off the top bits of mfn, I'm not entirely
clear on how this should work. If I understand correctly, if I mask off
certain top bits in mfn, then I would need to unmask those same top
bits in maddr_to_virt() and virt_to_maddr(). Is that correct?

Another point I’m unclear on is which specific part of the top bits
should be masked.

If you could explain this to me, I would really appreciate it, and I'll
be happy to use the masking approach.

> 
> > +}
> > +
> > +/*
> > + * Setup memory management
> > + *
> > + * RISC-V 64 has a large virtual address space (the minimum
> > supported
> > + * MMU mode is Sv39, which provides TBs of VA space).
> 
> Is it really TBs? According to my math you'd need more than 40 bits
> to
> map a single Tb (alongside other stuff).
I accidentally calculated it as the first 40 bits (from bits 0 to 39)
due to the "39" in Sv39. However, in reality, it’s actually 39 bits
(from bits 0 to 38), so it represents less than TBs, only GBs of
virtual address space.

> 
> > + */
> > +void __init setup_mm(void)
> > +{
> > +    const struct membanks *banks = bootinfo_get_mem();
> > +    paddr_t ram_start = INVALID_PADDR;
> > +    paddr_t ram_end = 0;
> > +    paddr_t ram_size = 0;
> > +    unsigned int i;
> > +
> > +    /*
> > +     * We need some memory to allocate the page-tables used for
> > the directmap
> > +     * mappings. But some regions may contain memory already
> > allocated
> > +     * for other uses (e.g. modules, reserved-memory...).
> > +     *
> > +     * For simplicity, add all the free regions in the boot
> > allocator.
> > +     */
> > +    populate_boot_allocator();
> > +
> > +    total_pages = 0;
> > +
> > +    for ( i = 0; i < banks->nr_banks; i++ )
> > +    {
> > +        const struct membank *bank = &banks->bank[i];
> > +        paddr_t bank_end = bank->start + bank->size;
> > +
> > +        ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
> 
> As before - if a bank doesn't cover full pages, this may give the
> impression
> of there being more "total pages" than there are.
Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total
size of a bank is 11K, ram_size will end up being 8K, so the "total
pages" will cover less RAM than the actual size of the RAM bank.

> 
> > +        ram_start = min(ram_start, bank->start);
> > +        ram_end = max(ram_end, bank_end);
> > +
> > +        setup_directmap_mappings(PFN_DOWN(bank->start),
> > +                                 PFN_DOWN(bank->size));
> 
> Similarly I don't think this is right when both start and size aren't
> multiple of PAGE_SIZE. You may map an unsuable partial page at the
> start,
> and then fail to map a fully usable page at the end.
ram_size should be a multiple of PAGE_SIZE because we have:
    ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);

Do you know of any examples where bank->start isn't aligned to
PAGE_SIZE? Should be somewhere mentioned what is legal physical address
for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no
choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part
of the bank.

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.