[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mm/pdx: Add comments throughout the codebase for pdx



On Thu, Jun 22, 2023 at 11:15:17AM +0200, Jan Beulich wrote:
> >>> --- a/xen/include/xen/mm.h
> >>> +++ b/xen/include/xen/mm.h
> >>> @@ -31,6 +31,22 @@
> >>>   *   (i.e. all devices assigned to) a guest share a single DMA address 
> >>> space
> >>>   *   and, by default, Xen will ensure dfn == pfn.
> >>>   *
> >>> + * pdx: Page InDeX
> >>> + *   Indices into the frame table holding the per-page's book-keeping
> >>> + *   metadata. A compression scheme is used and there's a non-identity
> >>> + *   mapping between valid(mfn) <-> valid(pdx) See the comments in pdx.c
> >>> + *   for an in-depth explanation of that mapping.
> >>
> >> The mapping may very well be (and on x86 typically is) an identity
> >> one. IOW you want to describe not only the compression case, but also
> >> the "no compression possible" one.
> > Point taken. I'll rephrase it slightly as "possibly non-identity" and
> > explicitly state the "no compression is possible" case.
> > 
> >>
> >> PDXes also aren't just indexes to the frame table, but also to the
> >> direct mapping.
> > I had something to that effect earlier on, but I removed it because it
> > doesn't seem to be the case on ARM. There's a directmap_base_pdx global
> > that states the first pdx to be mapped on the directmap.
> 
> Which would merely make it a biased index. I very much hope they
> eliminate holes (and not just unused leading space) from the directmap
> as well.
Yes, the directmap offset is still just a shifted pdx (albeit biased, as
you said), so the holes are naturally removed. Regardless, having to
explain a port-specific quirk in the main mm header seems like too much, so
I sticked with the common denominator: "A pdx is the frame table index of a
valid entry", and that holds for every port.

> >>> + * This is a technique to avoid wasting memory on machines known to have
> >>> + * split their machine address space in two big discontinuous and highly
> >>> + * disjoint chunks.
> >>
> >> Why two? There can be any number, and in fact on the system I originally
> >> had data from for reference (when first writing this code) iirc there
> >> were 8 nodes, each with a chunk of memory far away from the other chunks.
> >> The compression scheme used merely requires that some "inner" bits are
> >> unused (always zero) in all of those ranges.
> > Well, our implementation only supports two and I didn't see any obvious
> > hints about intending to increasing that number.
> 
> Where are you taking that "supports two" from? When I first wrote this code,
> it was tested against a system with 8 (maybe it was 4, but certainly more
> than 2) discontiguous regions (not counting the hole below 4G).
You can have any number, but there's a single contiguous bit slice being
removed, as far as I can see. The adaptor functions in
xen/include/xen/pdx.h perform a single shift.

    static inline unsigned long pfn_to_pdx(unsigned long pfn)
    {
        return (pfn & pfn_pdx_bottom_mask) |
               ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
    }

    static inline unsigned long pdx_to_pfn(unsigned long pdx)
    {
        return (pdx & pfn_pdx_bottom_mask) |
               ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
    }

Unless I'm missing some non-obvious piece of the puzzle, I'd say that for a
truly general compressor we'd need some kind of loop over the hole mask.

> > Out of curiosity (and for posterity's sake), what was/is that system?
> 
> I'm not sure I'm permitted to mention that. I'm pretty sure I carefully
> avoided mentioning the partner of ours back at the time.
Fair enough.

Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.