[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/8] pdx: introduce a new compression algorithm



On Mon, 30 Jun 2025, Roger Pau Monné wrote:
> On Fri, Jun 27, 2025 at 07:08:29PM -0700, Stefano Stabellini wrote:
> > Hi Roger,
> > 
> > We have an ARM board with the following memory layout:
> > 
> > 0x0-0x80000000, 0, 2G
> > 0x800000000,0x880000000, 32GB, 2G
> > 0x50000000000-0x50080000000 5T, 2GB 
> > 0x60000000000-0x60080000000 6T, 2GB 
> > 0x70000000000-0x70080000000 7T, 2GB 
> 
> With the current PDX mask compression you could compress 4bits AFAICT.
> 
> > It looks like your PDX series is exactly what we need.  However, I tried
> > to use it and it doesn't seem to be hooked properly on ARM yet. I spent
> > some time trying to fix it but I was unsuccessful.
> 
> Hm, weird.  It shouldn't need any special hooking, unless assumptions
> about the existing PDX mask compression have leaked into ARM code.
> 
> > As far as I can tell the following functions need to be adjusted but I
> > am not sure the list is comprehensive:
> > 
> > xen/arch/arm/include/asm/mmu/mm.h:maddr_to_virt
> 
> At least for CONFIG_ARM_64 this seems to be implemented correctly, as
> it's using maddr_to_directmapoff() which should have the correct
> translation between paddr -> directmap virt.
> 
> Also given the memory map above the adjustments done in ARM to remove
> any initial memory map offset should be no-ops, since I expect
> base_mfn == 0 in setup_directmap_mappings() in that particular case,
> and then directmap_mfn_start = directmap_base_pdx = 0 and
> directmap_virt_start = DIRECTMAP_VIRT_START.  FWIW, if ARM uses offset
> compression the special casing about removing the initial gap can be
> removed, as the compression should already take care of that.
> 
> > xen/arch/arm/mmu/mm.c:setup_frametable_mappings
> > xen/arch/arm/setup.c:init_pdx
> 
> I've attempted to adjust init_pdx() myself so it works with the new
> generic PDX compression setup, it seemed to work fine on the CI, but I
> don't have any real ARM machines to test myself.
 
> Is there a way I could reproduce the issue(s) you are seeing with
> QEMU?

Maybe. You can see how we run QEMU from gitlab-ci, but I don't know on
top of my head how to force QEMU to emulate multiple RAM banks at
specific addresses.


> I'm already working on v3, as this version implementation of
> mfn_valid() is buggy.  Maybe that's what you are hitting?
> 

This is the error:

(XEN) [0000000179e5f96b] Assertion '(mfn_to_pdx(maddr_to_mfn(ma)) - 
directmap_base_pdx) < (DIRECTMAP_SIZE >> PAGE_SHIFT)' failed at 
./arch/arm/include/asm/mmu/mm.h:72
(XEN) [0000000179e90619] ----[ Xen-4.21-unstable  arm64  debug=y  Not tainted 
]----
(XEN) [0000000179e9ee58] CPU:    0
(XEN) [0000000179eac907] PC:     00000a00002da5fc setup_mm+0x174/0x200
(XEN) [0000000179ed3ed0] LR:     00000a00002da580
(XEN) [0000000179edc486] SP:     00000a0000327e10
(XEN) [0000000179ee6b3a] CPSR:   00000000200003c9 MODE:64-bit EL2h (Hypervisor, 
handler)
(XEN) [0000000179ef5b4f]      X0: 0000050000000000  X1: 0000000050000000  X2: 
0000000000080000
(XEN) [0000000179f05de3]      X3: 0000000000000017  X4: 0000000000000000  X5: 
0000000050000000
(XEN) [0000000179f19396]      X6: 000000004fffffff  X7: 0000000000000000  X8: 
0000000000020400
(XEN) [0000000179f2d797]      X9: 000000000001b808 X10: 0000000000000080 X11: 
00000000000186de
(XEN) [0000000179f3d492]     X12: 000000000001a7df X13: 000000000001214f X14: 
0000000000017275
(XEN) [0000000179f50f4c]     X15: 00000a00002b48bc X16: 00000a0000291478 X17: 
0000000000000000
(XEN) [0000000179f60902]     X18: 000000007be9bbe0 X19: 0000000000000002 X20: 
0000000000000000
(XEN) [0000000179f6fde5]     X21: 0000050080000000 X22: 00000a00002f8008 X23: 
00000a00002b5c90
(XEN) [0000000179f7eeea]     X24: 0000000180000000 X25: 00000a00002b5e90 X26: 
0000000000000000
(XEN) [0000000179f8ee55]     X27: 0000000000000000 X28: 000000007bff2f70  FP: 
00000a0000327e10
(XEN) [0000000179fa6deb] 
(XEN) [0000000179fadf84]   VTCR_EL2: 0000000000000000
(XEN) [0000000179fb9994]  VTTBR_EL2: 0000000000000000
(XEN) [0000000179fc689d] 
(XEN) [0000000179fcc1a0]  SCTLR_EL2: 0000000030cd183d
(XEN) [0000000179fd95e3]    HCR_EL2: 0000000000000038
(XEN) [0000000179fe7082]  TTBR0_EL2: 0000000022148000
(XEN) [0000000179ff0d00] 
(XEN) [0000000179ff6d07]    ESR_EL2: 00000000f2000001
(XEN) [000000017a0003fe]  HPFAR_EL2: 0000000000000000
(XEN) [000000017a00c8f4]    FAR_EL2: 0000000000000000
(XEN) [000000017a018511] 
(XEN) [000000017a01fbe5] Xen stack trace from sp=00000a0000327e10:
(XEN) [000000017a02aa88]    00000a0000327e60 00000a00002e40c4 0000000022200000 
000000000000f000
(XEN) [000000017a03e578]    00000a0000c0a5c0 00000a0000332000 00000a0000a00000 
0000000000000000
(XEN) [000000017a04e676]    0000000000000000 0000000000000000 000000007be89ea0 
00000a00002001a4
(XEN) [000000017a0636e1]    0000000022000000 fffff60021e00000 0000000022200000 
0000000000001710
(XEN) [000000017a072ae0]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a084bf8]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a097ced]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a0a6829]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a0b8e71]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a0cdb4b]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a0e44b9]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a0f6a2b]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a1074a2]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a1178b3]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a128463]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [000000017a13a015]    0000000000000000 0000000000000000
(XEN) [000000017a144d66] Xen call trace:
(XEN) [000000017a14bcee]    [<00000a00002da5fc>] setup_mm+0x174/0x200 (PC)
(XEN) [000000017a15db0a]    [<00000a00002da580>] setup_mm+0xf8/0x200 (LR)
(XEN) [000000017a167dbb]    [<00000a00002e40c4>] start_xen+0x118/0x9d0
(XEN) [000000017a171724]    [<00000a00002001a4>] 
arch/arm/arm64/head.o#primary_switched+0x4/0x24
(XEN) [000000017a18abb4] 
(XEN) [000000017a19a465] 
(XEN) [000000017a19ffed] ****************************************
(XEN) [000000017a1aad66] Panic on CPU 0:
(XEN) [000000017a1b2757] Assertion '(mfn_to_pdx(maddr_to_mfn(ma)) - 
directmap_base_pdx) < (DIRECTMAP_SIZE >> PAGE_SHIFT)' failed at 
./arch/arm/include/asm/mmu/mm.h:72
(XEN) [000000017a1daedf] ****************************************
(XEN) [000000017a1eb0a9] 
(XEN) [000000017a1f2b27] Reboot in five seconds...


If I remove the ASSERT:

(XEN) [00000003bc65c616] parameter "debug" unknown!
(XEN) [00000003bc70915a] 
(XEN) [00000003bc70fd14] ****************************************
(XEN) [00000003bc71afec] Panic on CPU 0:
(XEN) [00000003bc724d03] The frametable cannot cover the physical region 
0000000000000000 - 0x00070080000000
(XEN) [00000003bc73786c] ****************************************
(XEN) [00000003bc741a19] 
(XEN) [00000003bc747833] Reboot in five seconds...


I think the issue (or one issue) is the implementation of
setup_frametable_mappings on ARM which is ignoring the pdx_group_valid
bitmap. I am attaching a work-in-progress patch from Michal to add
support for it for your reference. Remove commit fe6a12a08 to apply the
patch without conflict.

With Michal's patch, I can boot *without* your patches on the
problematic board.

I still cannot boot with your patches, even with Michal's patch. I still
hit the same ASSERT. If I remove the ASSERT I go further and hit:

(XEN) [00000001bccbd3ab] Panic on CPU 0:
(XEN) [00000001bccc4c3e] Frametable too small

I added some debug messages (see
attached stefano-debug.patch). Something seems to be wrong with the
pdx_group_valid bitmap after 0x880000, as we start getting MFN ranges
such as 0x254c0000-0x25500000 which don't make any sense to me.

(XEN) [00000001563012a8] DEBUG init_pdx 294 start=0 end=80000000
(XEN) [000000015630d6d9] DEBUG init_pdx 294 start=800000000 end=880000000
(XEN) [000000015631c73c] DEBUG init_pdx 294 start=50000000000 end=50080000000
(XEN) [000000015632947b] DEBUG init_pdx 294 start=60000000000 end=60080000000
(XEN) [00000001563365a8] DEBUG init_pdx 294 start=70000000000 end=70080000000
(XEN) [000000015637c6aa] DEBUG init_frametable 65 start=0 end=80000
(XEN) [00000001563898e1] DEBUG init_frametable_chunk 28 virt=a0800000000 
base_mfn=7007e000 pfn_start=0 pfn_end=80000
(XEN) [000000015692ed1f] DEBUG init_frametable 65 start=800000 end=880000
(XEN) [00000001569399fe] DEBUG init_frametable_chunk 28 virt=a081c000000 
base_mfn=7007c000 pfn_start=800000 pfn_end=880000
(XEN) [00000001573bad45] DEBUG init_frametable 65 start=254c0000 end=25500000
(XEN) [00000001573dee6a] DEBUG init_frametable_chunk 28 virt=a1028a00000 
base_mfn=7007a000 pfn_start=254c0000 pfn_end=25500000
(XEN) [00000001578ad5c2] DEBUG init_frametable 65 start=25700000 end=257c0000
(XEN) [00000001578b841d] DEBUG init_frametable_chunk 28 virt=a1030800000 
base_mfn=70076000 pfn_start=25700000 pfn_end=257c0000
(XEN) [000000015853b121] DEBUG init_frametable 65 start=27400000 end=27440000
(XEN) [00000001585470fe] DEBUG init_frametable_chunk 28 virt=a1096000000 
base_mfn=70074000 pfn_start=27400000 pfn_end=27440000
(XEN) [0000000158880a59] DEBUG init_frametable 65 start=27480000 end=27500000
(XEN) [000000015888d583] DEBUG init_frametable_chunk 28 virt=a1097c00000 
base_mfn=70072000 pfn_start=27480000 pfn_end=27500000
(XEN) [0000000158eacf55] DEBUG init_frametable 65 start=27580000 end=27a40000
(XEN) [0000000158eb7f8e] DEBUG init_frametable_chunk 28 virt=a109b400000 
base_mfn=70060000 pfn_start=27580000 pfn_end=27a40000
(XEN) [000000015cac7416] DEBUG init_frametable 65 start=27a80000 end=27ac0000
(XEN) [000000015cad6818] DEBUG init_frametable_chunk 28 virt=a10acc00000 
base_mfn=7005e000 pfn_start=27a80000 pfn_end=27ac0000
(XEN) [000000015cb26b99] arch/arm/mmu/pt.c:360: Changing MFN for a valid entry 
is not allowed (0x70071800 -> 0x7005e000).
(XEN) [000000015cb80f94] Xen WARN at arch/arm/mmu/pt.c:360
(XEN) [000000015cbabedc] ----[ Xen-4.21-unstable  arm64  debug=y  Not tainted 
]----

Attachment: stefano-debug.patch
Description: Text Data

Attachment: pdx-groups.patch
Description: Text Data


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.