|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] mm/pdx: Add comments throughout the codebase for pdx
On 21.06.2023 16:25, Alejandro Vallejo wrote:
> On Mon, Jun 19, 2023 at 05:30:20PM +0200, Jan Beulich wrote:
>>> @@ -57,9 +99,25 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
>>> (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
>>> }
>>>
>>> -u64 __init pdx_region_mask(u64 base, u64 len)
>>> +uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
>>> {
>>> - return fill_mask(base ^ (base + len - 1));
>>> + uint64_t last = base + len - 1;
>>> + /*
>>> + * The only bit that matters in base^last is the MSB. There are 2
>>> cases.
>>> + *
>>> + * case msb(base) < msb(last):
>>> + * then fill_mask(base^last) == fill_mask(last). This is non
>>> + * compressible.
>>> + * case msb(base) == msb(last):
>>> + * This means that there _may_ be a sequence of compressible zeroes
>>> + * for all addresses between `base` and `last` iff `base` has
>>> enough
>>> + * trailing zeroes. That is, it's compressible when
>>> + * fill_mask(base^last) < fill_mask(last)
>>> + *
>>> + * The resulting mask is effectively the moving bits between `base` and
>>> + * `last`
>>> + */
>>> + return fill_mask(base ^ last);
>>> }
>>
>> I don't see a need for you to actually change the code here. You can
>> as well introduce "last" as shorthand just for the comment.
> I thought as you did initially and wrote it as such. In the end it felt
> wrong to have an explanation in terms of a token not present in the code.
> Furthermore, explaining what the shorthand is in the comment takes more
> space than introducing `last` in the code itself.
>
> ```
> uint64_t last = base + len - 1;
> /*
> * The only bit that matters in base^last is the MSB. There are 2 cases.
> ```
> vs
> ```
> /*
> * Let `last = base + len -1` out of convenience.
> * The only bit that matters in base^last is the MSB. There are 2 cases.
> ```
>
> TL;DR: I didn't factor out `last` due to aesthetics (I'd rather not touch
> the code in this patch, in fact), but it seems warranted in order to reduce
> the impedance mismatch between this big-ish comment and the call it
> describes. I'll post v2 without that adjustment in case I managed to
> convince you. Otherwise feel free to adjust it on commit.
>
>> What I dislike in your way of putting it is the use of fill_mask(last) when
>> such a call never occurs in code. Starting from the first sentence,
>> can't you explain things just in terms of said MSB
> I see. I can refer to the MSBs instead. Works equally well.
>
> e.g:
> fill_mask(base^last) == fill_mask(last)
> |
> V
> msb(fill_mask(base^last)) == msb(last)
>
>> (where the two cases are "set" and "clear")?
> I'm not sure I understand what you mean here.
Hmm, yes, I think I was getting confused here.
>>> --- a/xen/include/xen/mm.h
>>> +++ b/xen/include/xen/mm.h
>>> @@ -31,6 +31,22 @@
>>> * (i.e. all devices assigned to) a guest share a single DMA address
>>> space
>>> * and, by default, Xen will ensure dfn == pfn.
>>> *
>>> + * pdx: Page InDeX
>>> + * Indices into the frame table holding the per-page's book-keeping
>>> + * metadata. A compression scheme is used and there's a non-identity
>>> + * mapping between valid(mfn) <-> valid(pdx) See the comments in pdx.c
>>> + * for an in-depth explanation of that mapping.
>>
>> The mapping may very well be (and on x86 typically is) an identity
>> one. IOW you want to describe not only the compression case, but also
>> the "no compression possible" one.
> Point taken. I'll rephrase it slightly as "possibly non-identity" and
> explicitly state the "no compression is possible" case.
>
>>
>> PDXes also aren't just indexes to the frame table, but also to the
>> direct mapping.
> I had something to that effect earlier on, but I removed it because it
> doesn't seem to be the case on ARM. There's a directmap_base_pdx global
> that states the first pdx to be mapped on the directmap.
Which would merely make it a biased index. I very much hope they
eliminate holes (and not just unused leading space) from the directmap
as well.
>>> + * ## PDX compression
>>> + *
>>> + * This is a technique to avoid wasting memory on machines known to have
>>> + * split their machine address space in two big discontinuous and highly
>>> + * disjoint chunks.
>>
>> Why two? There can be any number, and in fact on the system I originally
>> had data from for reference (when first writing this code) iirc there
>> were 8 nodes, each with a chunk of memory far away from the other chunks.
>> The compression scheme used merely requires that some "inner" bits are
>> unused (always zero) in all of those ranges.
> Well, our implementation only supports two and I didn't see any obvious
> hints about intending to increasing that number.
Where are you taking that "supports two" from? When I first wrote this code,
it was tested against a system with 8 (maybe it was 4, but certainly more
than 2) discontiguous regions (not counting the hole below 4G).
> I see where you're coming
> from, though. I can make it more general so it's not outdated if the
> pfn_to_pdx()/pdx_to_pfn() pair ever increases in scope to do several holes.
>
> Out of curiosity (and for posterity's sake), what was/is that system?
I'm not sure I'm permitted to mention that. I'm pretty sure I carefully
avoided mentioning the partner of ours back at the time.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |