[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/8] pdx: introduce a new compression algorithm



On Wed, 2 Jul 2025, Roger Pau Monné wrote:
> On Wed, Jul 02, 2025 at 10:54:24AM +0200, Orzel, Michal wrote:
> > 
> > 
> > On 02/07/2025 10:26, Roger Pau Monné wrote:
> > > On Wed, Jul 02, 2025 at 09:52:45AM +0200, Orzel, Michal wrote:
> > >>
> > >>
> > >> On 02/07/2025 09:00, Roger Pau Monné wrote:
> > >>> On Tue, Jul 01, 2025 at 01:46:19PM -0700, Stefano Stabellini wrote:
> > >>>> On Tue, 1 Jul 2025, Jan Beulich wrote:
> > >>>>> Sadly from this you omitted the output from the setup of the offsets
> > >>>>> arrays. Considering also your later reply, I'd be curious to know what
> > >>>>> mfn_to_pdx(0x50000000) is.
> > >>>>  
> > >>>> Full logs here, and debug patch in attachment.
> > >>>>
> > >>>> (XEN) Checking for initrd in /chosen
> > >>>> (XEN) RAM: 0000000000000000 - 000000007fffffff
> > >>>> (XEN) RAM: 0000000800000000 - 000000087fffffff
> > >>>> (XEN) RAM: 0000050000000000 - 000005007fffffff
> > >>>> (XEN) RAM: 0000060000000000 - 000006007fffffff
> > >>>> (XEN) RAM: 0000070000000000 - 000007007fffffff
> > >>>> (XEN) 
> > >>>> (XEN) MODULE[0]: 0000000022000000 - 0000000022172fff Xen         
> > >>>> (XEN) MODULE[1]: 0000000022200000 - 000000002220efff Device Tree 
> > >>>> (XEN) MODULE[2]: 0000000020400000 - 0000000021e2ffff Kernel      
> > >>>> (XEN)  RESVD[0]: 0000000000000000 - 0000000000ffffff
> > >>>> (XEN)  RESVD[1]: 0000000001000000 - 00000000015fffff
> > >>>> (XEN)  RESVD[2]: 0000000001600000 - 00000000017fffff
> > >>>> (XEN)  RESVD[3]: 0000000001800000 - 00000000097fffff
> > >>>> (XEN)  RESVD[4]: 0000000009800000 - 000000000bffffff
> > >>>> (XEN)  RESVD[5]: 0000000011126000 - 000000001114dfff
> > >>>> (XEN)  RESVD[6]: 000000001114e000 - 000000001214efff
> > >>>> (XEN)  RESVD[7]: 0000000017275000 - 000000001729cfff
> > >>>> (XEN)  RESVD[8]: 000000001729d000 - 000000001829dfff
> > >>>> (XEN)  RESVD[9]: 000000001a7df000 - 000000001a806fff
> > >>>> (XEN)  RESVD[10]: 000000001a807000 - 000000001b807fff
> > >>>> (XEN)  RESVD[11]: 000000001d908000 - 000000001d92ffff
> > >>>> (XEN)  RESVD[12]: 000000001d930000 - 000000001e930fff
> > >>>> (XEN)  RESVD[13]: 000000001829e000 - 000000001869dfff
> > >>>> (XEN)  RESVD[14]: 000000001869e000 - 00000000186ddfff
> > >>>> (XEN)  RESVD[15]: 0000000800000000 - 000000083fffffff
> > >>>> (XEN) 
> > >>>> (XEN) 
> > >>>> (XEN) Command line: console=dtuart dom0_mem=2048M 
> > >>>> console_timestamps=boot debug bootscrub=0 vwfi=native sched=null
> > >>>> (XEN) [00000006bfc302ec] parameter "debug" unknown!
> > >>>> (XEN) [00000006bfcc0476] DEBUG init_pdx 294 start=0 end=80000000
> > >>>> (XEN) [00000006bfcd2400] DEBUG init_pdx 294 start=800000000 
> > >>>> end=880000000
> > >>>> (XEN) [00000006bfce29ec] DEBUG init_pdx 294 start=50000000000 
> > >>>> end=50080000000
> > >>>> (XEN) [00000006bfcf1768] DEBUG init_pdx 294 start=60000000000 
> > >>>> end=60080000000
> > >>>> (XEN) [00000006bfd015a4] DEBUG init_pdx 294 start=70000000000 
> > >>>> end=70080000000
> > >>>> (XEN) [00000006bfd1444f] DEBUG setup_mm 252
> > >>>> (XEN) [00000006bfd3dc6f] DEBUG setup_mm 273 start=0 size=80000000 
> > >>>> ram_end=80000000 directmap_base_pdx=0
> > >>>> (XEN) [00000006bfd5616e] DEBUG setup_directmap_mappings 229 base_mfn=0 
> > >>>> nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=0
> > >>>> (XEN) [00000006bfd7d38a] DEBUG setup_directmap_mappings 237 base_mfn=0 
> > >>>> nr_mfns=80000 directmap_base_pdx=0
> > >>>> (XEN) [00000006bfd92728] DEBUG setup_mm 273 start=800000000 
> > >>>> size=80000000 ram_end=880000000 directmap_base_pdx=0
> > >>>> (XEN) [00000006bfdaba3b] DEBUG setup_directmap_mappings 229 
> > >>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=800000
> > >>>> (XEN) [00000006bfdcd79c] DEBUG setup_directmap_mappings 237 
> > >>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0
> > >>>> (XEN) [00000006bfde4d82] DEBUG setup_mm 273 start=50000000000 
> > >>>> size=80000000 ram_end=50080000000 directmap_base_pdx=0
> > >>>> (XEN) [00000006bfdfaef0] DEBUG setup_directmap_mappings 229 
> > >>>> base_mfn=50000000 nr_mfns=80000 directmap_base_pdx=0 
> > >>>> mfn_to_pdx=50000000
> > >>>> (XEN) [00000006bfe35249] Assertion '(mfn_to_pdx(maddr_to_mfn(ma)) - 
> > >>>> directmap_base_pdx) < (DIRECTMAP_SIZE >> PAGE_SHIFT)' failed at 
> > >>>> ./arch/arm/include/asm/mmu/mm.h:72
> > >>>
> > >>> As said on the other reply, the issue here is that with the v2 PDX
> > >>> offset compression logic your memory map is not compressible, and this
> > >>> leads to an overflow, as anything above 5TiB won't fit in the
> > >>> directmap AFAICT.  We already discussed with Jan that ARM seems to be
> > >>> missing any logic to account for the max addressable page:
> > >>>
> > >>> https://lore.kernel.org/xen-devel/9074f1a6-a605-43f4-97f3-d0a626252d3f@xxxxxxxx/
> > >>>
> > >>> x86 has setup_max_pdx() that truncates the maximum addressable MFN
> > >>> based on the active PDX compression and the virtual memory map
> > >>> restrictions.  ARM needs similar logic to account for this
> > >>> restrictions.
> > >>
> > >> We have a few issues on Arm. First, we don't check whether direct map is 
> > >> big
> > >> enough provided max_pdx that we don't set at all. Second, we don't 
> > >> really use
> > >> PDX grouping (can be also used without compression). My patch (that 
> > >> Stefano
> > >> attached previously) fixes the second issue (Allejandro will take it 
> > >> over to
> > >> come up with common solution).
> > > 
> > > You probably can handle those as different issues, as PDX grouping is
> > > completely disjoint from PDX compression.  It might be helpful if
> > > we could split the PDX grouping into a separate file from the PDX
> > > compression.
> > > 
> > > One weirdness I've noticed with ARM is the addition of start offsets
> > > to the existing PDX compression, by using directmap_base_pdx,
> > > directmap_mfn_start, directmap_base_pdx &c.  I'm not sure whether this 
> > > will
> > > interfere with the PDX compression, but it looks like a bodge.  This
> > > should be part of the generic PDX compression implementation, not an
> > > extra added on a per-arch basis.
> > > 
> > > FWIW, PDX offset translation should already compress any gaps from 0
> > > to the first RAM range, and hence this won't be needed (in fact it
> > > would just make ARM translations slower by doing an extra unneeded
> > > operation).  My recommendation would be to move this initial offset
> > > compression inside the PDX mask translation.
> > > 
> > >> For the first issue, we need to know max_page (at
> > >> the moment we calculate it in setup_mm() at the very end but we could do 
> > >> it in
> > >> init_pdx() to know it ahead of setting direct map) and PDX offset (on 
> > >> x86 there
> > >> is no offset). I also think that on Arm we should just panic if direct 
> > >> map is
> > >> too small.
> > > 
> > > Hm, that's up to the ARM folks, but my opinion is that you should
> > > simply ignore memory above the threshold.  Panicking should IMO be a
> > > last resort option when there's no way to workaround the issue.
> > On Arm we handle user errors and suspicious behavior usually as panics as 
> > oppose
> > to x86 which is more liberal in that regard. We want to fail as soon as 
> > possible.
> > 
> > > 
> > >> The issue can be reproduced by disabling PDX compression, so not only 
> > >> with
> > >> Roger's patch.
> > >>
> > >> @Julien, I'm thinking of something like this:
> > >>
> > >> diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
> > >> index 4d22f35618aa..e6d9b49acd3c 100644
> > >> --- a/xen/arch/arm/arm32/mmu/mm.c
> > >> +++ b/xen/arch/arm/arm32/mmu/mm.c
> > >> @@ -190,7 +190,6 @@ void __init setup_mm(void)
> > >>
> > >>      /* Frame table covers all of RAM region, including holes */
> > >>      setup_frametable_mappings(ram_start, ram_end);
> > >> -    max_page = PFN_DOWN(ram_end);
> > >>
> > >>      /*
> > >>       * The allocators may need to use map_domain_page() (such as for
> > >> diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
> > >> index a0a2dd8cc762..3e64be6ae664 100644
> > >> --- a/xen/arch/arm/arm64/mmu/mm.c
> > >> +++ b/xen/arch/arm/arm64/mmu/mm.c
> > >> @@ -224,6 +224,9 @@ static void __init setup_directmap_mappings(unsigned 
> > >> long
> > >> base_mfn,
> > >>           */
> > >>          directmap_virt_start = DIRECTMAP_VIRT_START +
> > >>              (base_mfn - mfn_gb) * PAGE_SIZE;
> > >> +
> > >> +        if ( (max_pdx - directmap_base_pdx) > (DIRECTMAP_SIZE >> 
> > >> PAGE_SHIFT) )
> > >> +            panic("Direct map is too small\n");
> > > 
> > > As said above - I would avoid propagating the usage of those offsets
> > > into generic memory management code, it's usage should be confined
> > > inside the translation functions.
> > directmap_base_pdx is set a few lines above, so I would not call it 
> > propagation.
> > 
> > > 
> > > Here you probably want to use maddr_to_virt() or similar.
> > I can't because maddr_to_virt() has the ASSERT with similar check.
> > > 
> > > You can maybe pickup:
> > > 
> > > https://lore.kernel.org/xen-devel/20250611171636.5674-3-roger.pau@xxxxxxxxxx/
> > > 
> > > And attempt to hook it into ARM?
> > As said above, we have different ways to approach setting max_pdx. On Arm we
> > want to panic, on x86 you want to limit the max_pdx.
> > 
> > > 
> > > I don't think it would that difficult to reduce the consumption of
> > > memory map ranges to what Xen can handle.
> > > 
> > > Thanks, Roger.
> > 
> > The diff I sent fixes the issue for direct map now. We can take it now if we
> > want to solve the issue. If we instead want to wait for frametable fixes 
> > (\wrt
> > grouping) and possible PDX changes (making offsets common) to be done 
> > first, I
> > can simply park this patch.
> 
> No please, don't park it just because of my opinions.  I think Julien
> is OK with it, so don't hold back because of my x86 based opinion on
> how to handle errors.

I would also rather have the small improvement now rather than later

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.