[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/8] pdx: introduce a new compression algorithm


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • Date: Wed, 2 Jul 2025 10:54:24 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pLnUpBTUFYOxpjYosgJsl3b4piSCSi1Kss8BL7dHlvo=; b=VJ0cwtMV8PmY5aqVOY5zfPXCtNBaJes2+daVA97RMoihPS5Hi4p9Gfoyh1zv2hmJU1C7Rs4EopKJXb0UQolv5fkLxefOd75d7Tp5iVj7rmIf7FvzzAY+6O4eVLqeDGQKZCITQJmmPRvIJmM2tVe4z6jj+oHUS37zQKbH+pjiV2SpEBu9vATFR40/wd/2uI+Dbwr0IV4WluaXF4TjHGfV7vy7+tUei3DLHMXurvs8eXFxL6dSXLRX5vxMclyqy8gf6EgOUmmTqXxMniTE0aOCLv0mKuCA4X31Zmle0mpeLArGF/rQ4gSNalQeULiT7p9j+lEz1dU0WYwBk47vGmQbcw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=odl7SrVVTbkzmEvahYg6IFuOOuJgOUdbxADdRa83WvPAHK5oRG6BmXnX3GL2J1CPYLbpqU2JXvtwJMEgKJNqjwZOcLCneSvhm+MxBbrBelhZg8tbA0yXy74mtfEc6V33JlVKY97El6qiymvMQ97wam6IJX809KLO6Di8WYu705bt14kbH4vYtMZrpxZ6qg4VFqABaVr6XidunK3pja5hQSxsjy9wOns7EQ0lK5MHerGQOXy2U4uLNsmrrwRovCO852SF6ouJb6AVNYaRdPpHQyFLANNgSzUy71OByNok5b96Ss9WW0M/XXyMvBYFU6/U78Fb3s4x0QK8oJOZUXvDeA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Shawn Anastasio <sanastasio@xxxxxxxxxxxxxxxxxxxxx>, Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>
  • Delivery-date: Wed, 02 Jul 2025 08:54:51 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 02/07/2025 10:26, Roger Pau Monné wrote:
> On Wed, Jul 02, 2025 at 09:52:45AM +0200, Orzel, Michal wrote:
>>
>>
>> On 02/07/2025 09:00, Roger Pau Monné wrote:
>>> On Tue, Jul 01, 2025 at 01:46:19PM -0700, Stefano Stabellini wrote:
>>>> On Tue, 1 Jul 2025, Jan Beulich wrote:
>>>>> Sadly from this you omitted the output from the setup of the offsets
>>>>> arrays. Considering also your later reply, I'd be curious to know what
>>>>> mfn_to_pdx(0x50000000) is.
>>>>  
>>>> Full logs here, and debug patch in attachment.
>>>>
>>>> (XEN) Checking for initrd in /chosen
>>>> (XEN) RAM: 0000000000000000 - 000000007fffffff
>>>> (XEN) RAM: 0000000800000000 - 000000087fffffff
>>>> (XEN) RAM: 0000050000000000 - 000005007fffffff
>>>> (XEN) RAM: 0000060000000000 - 000006007fffffff
>>>> (XEN) RAM: 0000070000000000 - 000007007fffffff
>>>> (XEN) 
>>>> (XEN) MODULE[0]: 0000000022000000 - 0000000022172fff Xen         
>>>> (XEN) MODULE[1]: 0000000022200000 - 000000002220efff Device Tree 
>>>> (XEN) MODULE[2]: 0000000020400000 - 0000000021e2ffff Kernel      
>>>> (XEN)  RESVD[0]: 0000000000000000 - 0000000000ffffff
>>>> (XEN)  RESVD[1]: 0000000001000000 - 00000000015fffff
>>>> (XEN)  RESVD[2]: 0000000001600000 - 00000000017fffff
>>>> (XEN)  RESVD[3]: 0000000001800000 - 00000000097fffff
>>>> (XEN)  RESVD[4]: 0000000009800000 - 000000000bffffff
>>>> (XEN)  RESVD[5]: 0000000011126000 - 000000001114dfff
>>>> (XEN)  RESVD[6]: 000000001114e000 - 000000001214efff
>>>> (XEN)  RESVD[7]: 0000000017275000 - 000000001729cfff
>>>> (XEN)  RESVD[8]: 000000001729d000 - 000000001829dfff
>>>> (XEN)  RESVD[9]: 000000001a7df000 - 000000001a806fff
>>>> (XEN)  RESVD[10]: 000000001a807000 - 000000001b807fff
>>>> (XEN)  RESVD[11]: 000000001d908000 - 000000001d92ffff
>>>> (XEN)  RESVD[12]: 000000001d930000 - 000000001e930fff
>>>> (XEN)  RESVD[13]: 000000001829e000 - 000000001869dfff
>>>> (XEN)  RESVD[14]: 000000001869e000 - 00000000186ddfff
>>>> (XEN)  RESVD[15]: 0000000800000000 - 000000083fffffff
>>>> (XEN) 
>>>> (XEN) 
>>>> (XEN) Command line: console=dtuart dom0_mem=2048M console_timestamps=boot 
>>>> debug bootscrub=0 vwfi=native sched=null
>>>> (XEN) [00000006bfc302ec] parameter "debug" unknown!
>>>> (XEN) [00000006bfcc0476] DEBUG init_pdx 294 start=0 end=80000000
>>>> (XEN) [00000006bfcd2400] DEBUG init_pdx 294 start=800000000 end=880000000
>>>> (XEN) [00000006bfce29ec] DEBUG init_pdx 294 start=50000000000 
>>>> end=50080000000
>>>> (XEN) [00000006bfcf1768] DEBUG init_pdx 294 start=60000000000 
>>>> end=60080000000
>>>> (XEN) [00000006bfd015a4] DEBUG init_pdx 294 start=70000000000 
>>>> end=70080000000
>>>> (XEN) [00000006bfd1444f] DEBUG setup_mm 252
>>>> (XEN) [00000006bfd3dc6f] DEBUG setup_mm 273 start=0 size=80000000 
>>>> ram_end=80000000 directmap_base_pdx=0
>>>> (XEN) [00000006bfd5616e] DEBUG setup_directmap_mappings 229 base_mfn=0 
>>>> nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=0
>>>> (XEN) [00000006bfd7d38a] DEBUG setup_directmap_mappings 237 base_mfn=0 
>>>> nr_mfns=80000 directmap_base_pdx=0
>>>> (XEN) [00000006bfd92728] DEBUG setup_mm 273 start=800000000 size=80000000 
>>>> ram_end=880000000 directmap_base_pdx=0
>>>> (XEN) [00000006bfdaba3b] DEBUG setup_directmap_mappings 229 
>>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=800000
>>>> (XEN) [00000006bfdcd79c] DEBUG setup_directmap_mappings 237 
>>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0
>>>> (XEN) [00000006bfde4d82] DEBUG setup_mm 273 start=50000000000 
>>>> size=80000000 ram_end=50080000000 directmap_base_pdx=0
>>>> (XEN) [00000006bfdfaef0] DEBUG setup_directmap_mappings 229 
>>>> base_mfn=50000000 nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=50000000
>>>> (XEN) [00000006bfe35249] Assertion '(mfn_to_pdx(maddr_to_mfn(ma)) - 
>>>> directmap_base_pdx) < (DIRECTMAP_SIZE >> PAGE_SHIFT)' failed at 
>>>> ./arch/arm/include/asm/mmu/mm.h:72
>>>
>>> As said on the other reply, the issue here is that with the v2 PDX
>>> offset compression logic your memory map is not compressible, and this
>>> leads to an overflow, as anything above 5TiB won't fit in the
>>> directmap AFAICT.  We already discussed with Jan that ARM seems to be
>>> missing any logic to account for the max addressable page:
>>>
>>> https://lore.kernel.org/xen-devel/9074f1a6-a605-43f4-97f3-d0a626252d3f@xxxxxxxx/
>>>
>>> x86 has setup_max_pdx() that truncates the maximum addressable MFN
>>> based on the active PDX compression and the virtual memory map
>>> restrictions.  ARM needs similar logic to account for this
>>> restrictions.
>>
>> We have a few issues on Arm. First, we don't check whether direct map is big
>> enough provided max_pdx that we don't set at all. Second, we don't really use
>> PDX grouping (can be also used without compression). My patch (that Stefano
>> attached previously) fixes the second issue (Allejandro will take it over to
>> come up with common solution).
> 
> You probably can handle those as different issues, as PDX grouping is
> completely disjoint from PDX compression.  It might be helpful if
> we could split the PDX grouping into a separate file from the PDX
> compression.
> 
> One weirdness I've noticed with ARM is the addition of start offsets
> to the existing PDX compression, by using directmap_base_pdx,
> directmap_mfn_start, directmap_base_pdx &c.  I'm not sure whether this will
> interfere with the PDX compression, but it looks like a bodge.  This
> should be part of the generic PDX compression implementation, not an
> extra added on a per-arch basis.
> 
> FWIW, PDX offset translation should already compress any gaps from 0
> to the first RAM range, and hence this won't be needed (in fact it
> would just make ARM translations slower by doing an extra unneeded
> operation).  My recommendation would be to move this initial offset
> compression inside the PDX mask translation.
> 
>> For the first issue, we need to know max_page (at
>> the moment we calculate it in setup_mm() at the very end but we could do it 
>> in
>> init_pdx() to know it ahead of setting direct map) and PDX offset (on x86 
>> there
>> is no offset). I also think that on Arm we should just panic if direct map is
>> too small.
> 
> Hm, that's up to the ARM folks, but my opinion is that you should
> simply ignore memory above the threshold.  Panicking should IMO be a
> last resort option when there's no way to workaround the issue.
On Arm we handle user errors and suspicious behavior usually as panics as oppose
to x86 which is more liberal in that regard. We want to fail as soon as 
possible.

> 
>> The issue can be reproduced by disabling PDX compression, so not only with
>> Roger's patch.
>>
>> @Julien, I'm thinking of something like this:
>>
>> diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
>> index 4d22f35618aa..e6d9b49acd3c 100644
>> --- a/xen/arch/arm/arm32/mmu/mm.c
>> +++ b/xen/arch/arm/arm32/mmu/mm.c
>> @@ -190,7 +190,6 @@ void __init setup_mm(void)
>>
>>      /* Frame table covers all of RAM region, including holes */
>>      setup_frametable_mappings(ram_start, ram_end);
>> -    max_page = PFN_DOWN(ram_end);
>>
>>      /*
>>       * The allocators may need to use map_domain_page() (such as for
>> diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
>> index a0a2dd8cc762..3e64be6ae664 100644
>> --- a/xen/arch/arm/arm64/mmu/mm.c
>> +++ b/xen/arch/arm/arm64/mmu/mm.c
>> @@ -224,6 +224,9 @@ static void __init setup_directmap_mappings(unsigned long
>> base_mfn,
>>           */
>>          directmap_virt_start = DIRECTMAP_VIRT_START +
>>              (base_mfn - mfn_gb) * PAGE_SIZE;
>> +
>> +        if ( (max_pdx - directmap_base_pdx) > (DIRECTMAP_SIZE >> 
>> PAGE_SHIFT) )
>> +            panic("Direct map is too small\n");
> 
> As said above - I would avoid propagating the usage of those offsets
> into generic memory management code, it's usage should be confined
> inside the translation functions.
directmap_base_pdx is set a few lines above, so I would not call it propagation.

> 
> Here you probably want to use maddr_to_virt() or similar.
I can't because maddr_to_virt() has the ASSERT with similar check.
> 
> You can maybe pickup:
> 
> https://lore.kernel.org/xen-devel/20250611171636.5674-3-roger.pau@xxxxxxxxxx/
> 
> And attempt to hook it into ARM?
As said above, we have different ways to approach setting max_pdx. On Arm we
want to panic, on x86 you want to limit the max_pdx.

> 
> I don't think it would that difficult to reduce the consumption of
> memory map ranges to what Xen can handle.
> 
> Thanks, Roger.

The diff I sent fixes the issue for direct map now. We can take it now if we
want to solve the issue. If we instead want to wait for frametable fixes (\wrt
grouping) and possible PDX changes (making offsets common) to be done first, I
can simply park this patch.

~Michal




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.