[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()


  • To: oleksii.kurochko@xxxxxxxxx
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 31 Oct 2024 10:08:19 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 31 Oct 2024 09:08:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30.10.2024 17:50, oleksii.kurochko@xxxxxxxxx wrote:
> On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote:
>> On 23.10.2024 17:50, Oleksii Kurochko wrote:
>>> @@ -25,8 +27,11 @@
>>>  
>>>  static inline void *maddr_to_virt(paddr_t ma)
>>>  {
>>> -    BUG_ON("unimplemented");
>>> -    return NULL;
>>> +    unsigned long va_offset = maddr_to_directmapoff(ma);
>>> +
>>> +    ASSERT(va_offset < DIRECTMAP_SIZE);
>>> +
>>> +    return (void *)(XENHEAP_VIRT_START + va_offset);
>>>  }
>>
>> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when
>> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START
>> in the first place: You don't have a separate "heap" virtual address
>> range, do you?
> The name may not be ideal for RISC-V. I borrowed it from Arm, intending
> to account for cases where the directmap virtual start might not align
> with DIRECTMAP_VIRT_START due to potential adjustments for superpage
> mapping.
> And my understanding is that XENHEAP == DIRECTMAP in case of Arm64.

Just to mention it: If I looked at Arm64 in isolation (without also
considering Arm32, and hence the desire to keep code common where possible),
I'd consider the mere existence of XENHEAP_VIRT_START (without an
accompanying XENHEAP_VIRT_SIZE) a mistake. Therefore for RISC-V its
introduction may be justified by (remote) plans to also cover RV32 at some
point. Yet such than needs sayin explicitly in the description.

>>> @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t
>>> fdt_paddr)
>>>  
>>>      return fdt_virt;
>>>  }
>>> +
>>> +#ifndef CONFIG_RISCV_32
>>
>> I'd like to ask that you be more selective with this #ifdef (or omit
>> it
>> altogether here). setup_mm() itself, for example, looks good for any
>> mode.
> Regarding setup_mm() as they have pretty different implementations for
> 32 and 64 bit versions.

Not setup_mm() itself, it seems. Its helpers - sure.

>>> +    if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
>>> +                          PFN_DOWN(frametable_size),
>>> +                          PAGE_HYPERVISOR_RW) )
>>> +        panic("Unable to setup the frametable mappings\n");
>>> +
>>> +    memset(&frame_table[0], 0, nr_mfns * sizeof(struct
>>> page_info));
>>> +    memset(&frame_table[nr_mfns], -1,
>>> +           frametable_size - (nr_mfns * sizeof(struct
>>> page_info)));
>>
>> Here (see comments on v1) you're still assuming ps == 0.
> Do you refer to ?
> ```
>> +/* Map a frame table to cover physical addresses ps through pe */
>> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>> +{
>> +    unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -
> 
> This looks to be accounting for a partial page at the end.
> 
>> +                            mfn_x(maddr_to_mfn(ps)) + 1;
> 
> Whereas this doesn't do the same at the start. The sole present caller
> passes 0, so that's going to be fine for the time being. Yet it's a
> latent pitfall. I'd recommend to either drop the function parameter, or
> to deal with it correctly right away.
> ```
> And I've added aligned_ps to cover the case that ps could be not page
> aligned.

Not this, no, but ...

> Or are you refering to 0 in memset(&frame_table[0],...)?

... this. If the start address wasn't 0, you'd need to invalidate a
region at the start of the table, just as you invalidate a region at the
end.

>>> +/* Map the region in the directmap area. */
>>> +static void __init setup_directmap_mappings(unsigned long
>>> base_mfn,
>>> +                                            unsigned long nr_mfns)
>>> +{
>>> +    int rc;
>>> +
>>> +    /* First call sets the directmap physical and virtual offset.
>>> */
>>> +    if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
>>> +    {
>>> +        directmap_mfn_start = _mfn(base_mfn);
>>> +
>>> +        /*
>>> +         * The base address may not be aligned to the second level
>>> +         * size (e.g. 1GB when using 4KB pages). This would
>>> prevent
>>> +         * superpage mappings for all the regions because the
>>> virtual
>>> +         * address and machine address should both be suitably
>>> aligned.
>>> +         *
>>> +         * Prevent that by offsetting the start of the directmap
>>> virtual
>>> +         * address.
>>> +         */
>>> +        directmap_virt_start = DIRECTMAP_VIRT_START +
>>> pfn_to_paddr(base_mfn);
>>
>> Don't you need to mask off top bits of the incoming MFN here, or else
>> you
>> may waste a huge part of direct map space?
> Yes, it will result in a loss of direct map space, but we still have a
> considerable amount available in Sv39 mode and higher modes. The
> largest RAM_START I see currently is 0x1000000000, which means we would
> lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still
> plenty of free space available, and we can always increase
> DIRECTMAP_SIZE since we have a lot of free virtual address space in
> Sv39.

Wow, 68 out of 308 - that's more than 20%. I'm definitely concerned of
this then.

> Finally, regarding masking off the top bits of mfn, I'm not entirely
> clear on how this should work. If I understand correctly, if I mask off
> certain top bits in mfn, then I would need to unmask those same top
> bits in maddr_to_virt() and virt_to_maddr(). Is that correct?
> 
> Another point I’m unclear on is which specific part of the top bits
> should be masked.

You want to "move" the directmap such that the first legitimate RAM
page is within the first (large/huge) page mapping of the directmap.
IOW the "virtual" start of the directmap would move down in VA space.
That still leaves things at a simple offset calculation when
translating VA <-> PA.

To give an example: Let's assume RAM starts at 61.5 Gb, and you want to
use 1Gb mappings for the bulk of the directmap. Then the "virtual" start
of the directmap would shift down to DIRECTMAP_VIRT_START - 60Gb,
such that the first RAM page would be mapped at
DIRECTMAP_VIRT_START + 1.5Gb. IOW it would be the low 30 address bits of
the start address that you use (30 - PAGE_SHIFT for the MFN), with the
higher bits contributing to the offset involved in the VA <-> PA
translation. Values used depend on the (largest) page size you mean to
use for the direct map: On systems with terabytes of memory (demanding
Sv48 or even Sv57 mode) you may want to use 512Gb mappings, and hence
you'd then need to mask the low 39 bits (or 48 for 256Tb mappings).

>>> +void __init setup_mm(void)
>>> +{
>>> +    const struct membanks *banks = bootinfo_get_mem();
>>> +    paddr_t ram_start = INVALID_PADDR;
>>> +    paddr_t ram_end = 0;
>>> +    paddr_t ram_size = 0;
>>> +    unsigned int i;
>>> +
>>> +    /*
>>> +     * We need some memory to allocate the page-tables used for
>>> the directmap
>>> +     * mappings. But some regions may contain memory already
>>> allocated
>>> +     * for other uses (e.g. modules, reserved-memory...).
>>> +     *
>>> +     * For simplicity, add all the free regions in the boot
>>> allocator.
>>> +     */
>>> +    populate_boot_allocator();
>>> +
>>> +    total_pages = 0;
>>> +
>>> +    for ( i = 0; i < banks->nr_banks; i++ )
>>> +    {
>>> +        const struct membank *bank = &banks->bank[i];
>>> +        paddr_t bank_end = bank->start + bank->size;
>>> +
>>> +        ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
>>
>> As before - if a bank doesn't cover full pages, this may give the
>> impression
>> of there being more "total pages" than there are.
> Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total
> size of a bank is 11K, ram_size will end up being 8K, so the "total
> pages" will cover less RAM than the actual size of the RAM bank.

ram_start at 2k but bank size being 13k would yield 2 usable pages
(first partial page of 2k unusable and last partial page of 3k
unusable), yet ram_size of 12k (3 pages). You need to consider the
common case; checking things work for a randomly chosen example isn't
enough.

>>> +        ram_start = min(ram_start, bank->start);
>>> +        ram_end = max(ram_end, bank_end);
>>> +
>>> +        setup_directmap_mappings(PFN_DOWN(bank->start),
>>> +                                 PFN_DOWN(bank->size));
>>
>> Similarly I don't think this is right when both start and size aren't
>> multiple of PAGE_SIZE. You may map an unsuable partial page at the
>> start,
>> and then fail to map a fully usable page at the end.
> ram_size should be a multiple of PAGE_SIZE because we have:
>     ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
> 
> Do you know of any examples where bank->start isn't aligned to
> PAGE_SIZE?

Question is the other way around: Is it specified anywhere that start (and
size) _need_ to be aligned? And if it is - do all firmware makers play by
that (on x86 at least specifications often mean pretty little to firmware
people, apparently)?

> Should be somewhere mentioned what is legal physical address
> for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no
> choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part
> of the bank.

Correct - partial pages simply cannot be used (except in adhoc ways, which
likely isn't very desirable).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.