[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5] pdx: introduce a new compression algorithm based on region offsets
On 12.08.2025 17:06, Roger Pau Monne wrote: > With the appearance of Intel Sierra Forest and Granite Rapids it's now > possible to get a production x86 host with the following memory map: > > SRAT: Node 0 PXM 0 [0000000000000000, 000000007fffffff] > SRAT: Node 0 PXM 0 [0000000100000000, 000000807fffffff] > SRAT: Node 1 PXM 1 [0000063e80000000, 000006be7fffffff] > SRAT: Node 2 PXM 2 [00000c7e80000000, 00000cfe7fffffff] > SRAT: Node 3 PXM 3 [000012be80000000, 0000133e7fffffff] > > This is from a four socket Granite Rapids system, with each node having > 512GB of memory. The total amount of RAM on the system is 2TB, but without > enabling CONFIG_BIGMEM the last range is not accessible, as it's above the > 16TB boundary covered by the frame table. Sierra Forest and Granite Rapids > are socket compatible, however Sierra Forest only supports 2 socket > configurations, while Granite Rapids can go up to 8 sockets. > > Note that while the memory map is very sparse, it couldn't be compressed > using the current PDX_MASK compression algorithm, which relies on all > ranges having a shared zeroed region of bits that can be removed. > > The memory map presented above has the property of all regions being > similarly spaced between each other, and all having also a similar size. > Use a lookup table to store the offsets to translate from/to PFN and PDX > spaces. Such table is indexed based on the input PFN or PDX to translated. > The example PFN layout about would get compressed using the following: > > PFN compression using PFN lookup table shift 29 and PDX region size 0x10000000 > range 0 [0000000000000, 0x0000807ffff] PFN IDX 0 : 0000000000000 > range 1 [0x00063e80000, 0x0006be7ffff] PFN IDX 3 : 0x00053e80000 > range 2 [0x000c7e80000, 0x000cfe7ffff] PFN IDX 6 : 0x000a7e80000 > range 3 [0x0012be80000, 0x00133e7ffff] PFN IDX 9 : 0x000fbe80000 > > Note how the tow ranges belonging to node 0 get merged into a single PDX > region by the compression algorithm. > > The default size of lookup tables currently set in Kconfig is 64 entries, > and the example memory map consumes 10 entries. Such memory map is from a > 4 socket Granite Rapids host, which in theory supports up to 8 sockets > according to Intel documentation. Assuming the layout of a 8 socket system > is similar to the 4 socket one, it would require 21 lookup table entries to > support it, way below the current default of 64 entries. > > The valid range of lookup table size is currently restricted from 1 to 512 > elements in Kconfig. > > An extra array is used to keep track of the base PFN for each translated > range. Non used slots are set to ~0UL, so that in mfn_valid() the mfn < > base check always fails, thus reporting the mfn as invalid. > > Introduce __init_or_pdx_mask and use it on some shared functions between > PDX mask and offset compression, as otherwise some code becomes unreachable > after boot if PDX offset compression is used. Mark the code as __init in > that case, so it's pruned after boot. > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> with one cosmetic remark (sorry for not spotting this earlier): > --- a/tools/tests/pdx/harness.h > +++ b/tools/tests/pdx/harness.h > @@ -44,8 +44,10 @@ > > #define MAX_RANGES 16 > #define MAX_PFN_RANGES MAX_RANGES > +#define CONFIG_PDX_OFFSET_TBL_ORDER 6 > > #define ASSERT assert > +#define ASSERT_UNREACHABLE() assert(0) > > #define CONFIG_DEBUG > > @@ -66,10 +68,22 @@ static inline unsigned int find_next( > #define find_next_zero_bit(a, s, o) find_next(a, s, o, false) > #define find_next_bit(a, s, o) find_next(a, s, o, true) > > +#define flsl(x) ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0) > +#define ffsl(x) __builtin_ffsl(x) > + > #define boolean_param(name, func) > > typedef uint64_t paddr_t; > > +#define SWAP(a, b) \ > + do { typeof(a) t_ = (a); (a) = (b); (b) = t_; } while ( 0 ) > + > +#define sort(elem, nr, size, cmp, swp) ({ \ > + /* Consume swp() so compiler doesn't complain it's unused. */ \ > + (void)swp; \ It generally shouldn't matter here, yet maybe still better to parenthesize swp. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |