[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mm/pdx: Add comments throughout the codebase for pdx


  • To: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 22 Jun 2023 11:15:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CMl68EtgETa3xL4v+g74W+ZbCX557QpXLqKZGx9XphY=; b=MhxRzEsc6Tb19NR6JqKTKpYcvwXWCLdNwoyO/mV2dYDQjVy/n37hsES9w1JASpTwcafa8Zhp4KSJcdkDag4IGx9wwg08ANlGaEn7G6pDO4AeA5ug2ZYEv6+WV0J149S8PKJEkaWhZy1XOEn5UJ7KpluFXY+fulOs4gaeCek/P7h8cFo8ZWWLqDl94leZ4eZdZDnoYXPteHJ35ZV84F4rVLMyEg5gTBgHbk9+fzlis+irhroUAu29xhCzoCYWiUxmlcrG1GBpKUtHDEdgBuNgGVWV2PoZ77p4mYUt1VqtjA8N2avcAsm4Zh8MKO7Lxj7FLR0c4uv7pvwnaddCs2rGiQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HmPQ7/JcanUPzVm2+BZNtSZYcCq2zrj+eYpy6hiGlPJcSHkDwztva4Z9IJW4ya66CNGNFQxE9zJIVC+ijuuJD2RmIV2CsTYPXirXvJvlA37KpdYHaGqMZAx+CKUzMzS/t3Rg2aWZet0y6YRVh5LCCMJ1t9W+q719QLPR7y42g+6AoxOrk8td+MOvMVUVGZJeE76h2f781XMBoJtXkxpTmrNsOpxrRRMSbB2KsGhIPDzrXwxXZhD+FhUiLkj8yMef/b94ddIGyaB17ytlpucGglmflB08t5zj23tAmQc9kE3ylYxAnIFE8ijftqYX4xnaxhlQD7CyamoB7Dmy8bJplQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 22 Jun 2023 09:15:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.06.2023 16:25, Alejandro Vallejo wrote:
> On Mon, Jun 19, 2023 at 05:30:20PM +0200, Jan Beulich wrote:
>>> @@ -57,9 +99,25 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
>>>                           (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
>>>  }
>>>  
>>> -u64 __init pdx_region_mask(u64 base, u64 len)
>>> +uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
>>>  {
>>> -    return fill_mask(base ^ (base + len - 1));
>>> +    uint64_t last = base + len - 1;
>>> +    /*
>>> +     * The only bit that matters in base^last is the MSB. There are 2 
>>> cases.
>>> +     *
>>> +     * case msb(base) < msb(last):
>>> +     *     then fill_mask(base^last) == fill_mask(last). This is non
>>> +     *     compressible.
>>> +     * case msb(base) == msb(last):
>>> +     *     This means that there _may_ be a sequence of compressible zeroes
>>> +     *     for all addresses between `base` and `last` iff `base` has 
>>> enough
>>> +     *     trailing zeroes. That is, it's compressible when
>>> +     *     fill_mask(base^last) < fill_mask(last)
>>> +     *
>>> +     * The resulting mask is effectively the moving bits between `base` and
>>> +     * `last`
>>> +     */
>>> +    return fill_mask(base ^ last);
>>>  }
>>
>> I don't see a need for you to actually change the code here. You can
>> as well introduce "last" as shorthand just for the comment.
> I thought as you did initially and wrote it as such. In the end it felt
> wrong to have an explanation in terms of a token not present in the code.
> Furthermore, explaining what the shorthand is in the comment takes more
> space than introducing `last` in the code itself.
> 
> ```
>    uint64_t last = base + len - 1;
>   /*
>    * The only bit that matters in base^last is the MSB. There are 2 cases.
> ```
>                               vs
> ```
>   /*
>    * Let `last = base + len -1` out of convenience.
>    * The only bit that matters in base^last is the MSB. There are 2 cases.
> ```
> 
> TL;DR: I didn't factor out `last` due to aesthetics (I'd rather not touch
> the code in this patch, in fact), but it seems warranted in order to reduce
> the impedance mismatch between this big-ish comment and the call it
> describes. I'll post v2 without that adjustment in case I managed to
> convince you. Otherwise feel free to adjust it on commit.
> 
>> What I dislike in your way of putting it is the use of fill_mask(last) when
>> such a call never occurs in code. Starting from the first sentence,
>> can't you explain things just in terms of said MSB
> I see. I can refer to the MSBs instead. Works equally well.
> 
> e.g:
>   fill_mask(base^last) == fill_mask(last)
>                  |
>                  V
>   msb(fill_mask(base^last)) == msb(last)
> 
>> (where the two cases are "set" and "clear")?
> I'm not sure I understand what you mean here.

Hmm, yes, I think I was getting confused here.

>>> --- a/xen/include/xen/mm.h
>>> +++ b/xen/include/xen/mm.h
>>> @@ -31,6 +31,22 @@
>>>   *   (i.e. all devices assigned to) a guest share a single DMA address 
>>> space
>>>   *   and, by default, Xen will ensure dfn == pfn.
>>>   *
>>> + * pdx: Page InDeX
>>> + *   Indices into the frame table holding the per-page's book-keeping
>>> + *   metadata. A compression scheme is used and there's a non-identity
>>> + *   mapping between valid(mfn) <-> valid(pdx) See the comments in pdx.c
>>> + *   for an in-depth explanation of that mapping.
>>
>> The mapping may very well be (and on x86 typically is) an identity
>> one. IOW you want to describe not only the compression case, but also
>> the "no compression possible" one.
> Point taken. I'll rephrase it slightly as "possibly non-identity" and
> explicitly state the "no compression is possible" case.
> 
>>
>> PDXes also aren't just indexes to the frame table, but also to the
>> direct mapping.
> I had something to that effect earlier on, but I removed it because it
> doesn't seem to be the case on ARM. There's a directmap_base_pdx global
> that states the first pdx to be mapped on the directmap.

Which would merely make it a biased index. I very much hope they
eliminate holes (and not just unused leading space) from the directmap
as well.

>>> + * ## PDX compression
>>> + *
>>> + * This is a technique to avoid wasting memory on machines known to have
>>> + * split their machine address space in two big discontinuous and highly
>>> + * disjoint chunks.
>>
>> Why two? There can be any number, and in fact on the system I originally
>> had data from for reference (when first writing this code) iirc there
>> were 8 nodes, each with a chunk of memory far away from the other chunks.
>> The compression scheme used merely requires that some "inner" bits are
>> unused (always zero) in all of those ranges.
> Well, our implementation only supports two and I didn't see any obvious
> hints about intending to increasing that number.

Where are you taking that "supports two" from? When I first wrote this code,
it was tested against a system with 8 (maybe it was 4, but certainly more
than 2) discontiguous regions (not counting the hole below 4G).

> I see where you're coming
> from, though. I can make it more general so it's not outdated if the
> pfn_to_pdx()/pdx_to_pfn() pair ever increases in scope to do several holes.
> 
> Out of curiosity (and for posterity's sake), what was/is that system?

I'm not sure I'm permitted to mention that. I'm pretty sure I carefully
avoided mentioning the partner of ours back at the time.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.