|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.22] device-tree: validate first hwdom bank for boot modules
Hi Michal,
Thank you for the detailed review.
On Tue, Jun 2, 2026 at 12:06 PM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>
>
>
> On 29-May-26 17:10, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > With LLC coloring enabled, the hardware domain memory is allocated by
> > allocate_hwdom_memory() rather than by using the fixed direct-map layout.
> >
> > Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> > DMA") made that allocator prefer lower host regions. The first-bank
> > filter, however, still only checked the old 128MB heuristic. A low
> > region can satisfy that heuristic but still be too small, or otherwise
> > unsuitable, for the hardware-domain kernel and the DTB/initrd module
> > area to fit in bank 0 according to the Arm placement rules.
> >
> > Keep the existing first-bank size policy and add an architecture-specific
> > candidate check. On Arm, compute the kernel load address for the candidate
> > bank using the same logic as kernel_zimage_place(), verify that the kernel
> > range is covered by that bank, and then reuse the same module-placement
> > helper as place_modules(). The FDT is generated later, so use the
> > hardware-domain FDT allocation size as a conservative upper bound for the
> > final DTB size.
> >
> > Check the candidate after capping the host region by the remaining
> > unassigned hardware-domain memory, so the validation is performed against
> > the size that would actually become bank 0.
> >
> > This keeps the DMA-oriented allocation policy from de99f3263555 while
> > preventing a too-small bank 0 from reaching place_modules().
> >
> > Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Changes since RFC:
> > - Do not keep the RFC scalar minimum-size check. It can both reject
> > valid layouts and accept layouts which still fail later. Instead,
> > validate the candidate bank using the same kernel and module placement
> > rules as the load path.
> > Replace the scalar minimum-size check with arch_hwdom_first_bank_ok().
> > - Reuse the existing Arm kernel and DTB/initrd placement rules for the
> > first-bank candidate check.
> > - Treat the hardware-domain FDT allocation size as a conservative upper
> > bound because the final FDT is generated later.
> >
> > Link to RFC:
> >
> > https://patchew.org/Xen/9ae4f7dd49f5b1f761193adae573c2675c92e883.1779051035.git.mykola._5Fkvach@xxxxxxxx/
> >
> > Why the RFC scalar approach was not kept:
> >
> > A simple minimum-size check is not sufficient here because the validity of
> > the first bank depends on the actual Arm placement rules, not only on the
> > aggregate size of the kernel, DTB and initrd. The DTB/initrd area may fit
> > before a 64-bit Image loaded with a text offset, while an AArch32
> > position-independent kernel may leave no valid module location even when
> > the aggregate size appears to fit. Fixed-address kernels also need the
> > candidate bank start to be considered.
> >
> > Link to synthetic tests output:
> >
> > https://gitlab.com/xen-project/people/mykola_kvach/xen/-/blob/fix/hwdom-first-bank-dom0-modules-v2-new/tools/tests/arm-boot-modules/test-arm-boot-modules.log?ref_type=heads
> >
> > ---
> > xen/arch/arm/acpi/domain_build.c | 2 -
> > xen/arch/arm/domain_build.c | 8 ++
> > xen/arch/arm/include/asm/domain_build.h | 4 +
> > xen/arch/arm/include/asm/kernel.h | 9 ++
> > xen/arch/arm/kernel.c | 179 ++++++++++++++++++------
> > xen/common/device-tree/domain-build.c | 24 +++-
> > xen/include/xen/fdt-kernel.h | 9 ++
> > 7 files changed, 182 insertions(+), 53 deletions(-)
> >
> > diff --git a/xen/arch/arm/acpi/domain_build.c
> > b/xen/arch/arm/acpi/domain_build.c
> > index 249d899c33..db16f7fa94 100644
> > --- a/xen/arch/arm/acpi/domain_build.c
> > +++ b/xen/arch/arm/acpi/domain_build.c
> > @@ -26,8 +26,6 @@
> > #undef virt_to_mfn
> > #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> >
> > -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > -
> > static int __init acpi_iomem_deny_access(struct domain *d)
> > {
> > acpi_status status;
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 1efddc60ef..550617f152 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const
> > char *e)
> > (IS_ENABLED(CONFIG_STATIC_SHM) ? \
> > (NR_SHMEM_BANKS * (160 + 16)) : 0))
> >
> > +paddr_t __init hwdom_get_fdt_alloc_size(void)
> > +{
> > + if ( acpi_disabled )
> > + return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
> > +
> > + return ACPI_DOM0_FDT_MIN_SIZE;
> > +}
> > +
> > unsigned int __init dom0_max_vcpus(void)
> > {
> > if ( opt_dom0_max_vcpus == 0 )
> > diff --git a/xen/arch/arm/include/asm/domain_build.h
> > b/xen/arch/arm/include/asm/domain_build.h
> > index df8b361b3d..85cf46a958 100644
> > --- a/xen/arch/arm/include/asm/domain_build.h
> > +++ b/xen/arch/arm/include/asm/domain_build.h
> > @@ -19,6 +19,10 @@ int prepare_acpi(struct domain *d, struct kernel_info
> > *kinfo);
> >
> > int add_ext_regions(unsigned long s_gfn, unsigned long e_gfn, void *data);
> >
> > +#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > +
> > +paddr_t hwdom_get_fdt_alloc_size(void);
> > +
> > #if defined(CONFIG_MPU) && defined(CONFIG_ARM_64)
> > /* Utility function to determine if an Armv8-R processor supports VMSA. */
> > bool has_v8r_vmsa_support(void);
> > diff --git a/xen/arch/arm/include/asm/kernel.h
> > b/xen/arch/arm/include/asm/kernel.h
> > index 21f4273fa1..bf14fb208a 100644
> > --- a/xen/arch/arm/include/asm/kernel.h
> > +++ b/xen/arch/arm/include/asm/kernel.h
> > @@ -8,12 +8,21 @@
> >
> > #include <asm/domain.h>
> >
> > +#include <xen/types.h>
> > +
> > +struct kernel_info;
> > +
> > struct arch_kernel_info
> > {
> > /* Enable pl011 emulation */
> > bool vpl011;
> > };
> >
> > +#define arch_hwdom_first_bank_ok arch_hwdom_first_bank_ok
> > +bool arch_hwdom_first_bank_ok(const struct kernel_info *info,
> > + paddr_t bank_start,
> > + paddr_t bank_size);
> > +
> > #endif /* #ifdef __ARCH_ARM_KERNEL_H__ */
> >
> > /*
> > diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> > index b72585b7fe..907239a246 100644
> > --- a/xen/arch/arm/kernel.c
> > +++ b/xen/arch/arm/kernel.c
> > @@ -40,27 +40,67 @@ struct minimal_dtb_header {
> > /* There are other fields but we don't use them yet. */
> > };
> >
> > -static void __init place_modules(struct kernel_info *info,
> > - paddr_t kernbase, paddr_t kernend)
> > +static paddr_t __init
> > +kernel_zimage_place_in_bank(const struct kernel_info *info,
> > + paddr_t bank_start, paddr_t bank_size)
> > {
> > - /* Align DTB and initrd size to 2Mb. Linux only requires 4 byte
> > alignment */
> > - const struct boot_module *mod = info->bd.initrd;
> > - const struct membanks *mem = kernel_info_get_mem(info);
> > - const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> > - const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt), MB(2));
> > - const paddr_t modsize = initrd_len + dtb_len;
> > + paddr_t load_addr;
> >
> > - /* Convenient */
> > - const paddr_t rambase = mem->bank[0].start;
> > - const paddr_t ramsize = mem->bank[0].size;
> > - const paddr_t ramend = rambase + ramsize;
> > +#ifdef CONFIG_HAS_DOMAIN_TYPE
> > + if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> > + return bank_start + info->image.text_offset;
> > +#endif
> > +
> > + /*
> > + * If start is zero, the zImage is position independent, in this
> > + * case Documentation/arm/Booting recommends loading below 128MiB
> > + * and above 32MiB. Load it as high as possible within these
> > + * constraints, while also avoiding the DTB.
> > + */
> > + if ( info->image.start == 0 )
> > + {
> > + paddr_t load_end;
> > + paddr_t ram128mb;
> > +
> > + ram128mb = bank_start + MB(128);
> > + load_end = bank_start + bank_size;
> > + load_end = min(ram128mb, load_end);
> > +
> > + if ( load_end - bank_start < info->image.len )
> > + return INVALID_PADDR;
> > +
> > + load_addr = load_end - info->image.len;
> > + /* Align to 2MB */
> > + load_addr &= ~(MB(2) - 1);
> > + if ( load_addr < bank_start )
> > + return INVALID_PADDR;
> > + }
> > + else
> > + load_addr = info->image.start;
> > +
> > + return load_addr;
> > +}
> > +
> > +static bool __init
> > +first_bank_has_enough_room(paddr_t ramsize, paddr_t kernbase,
> > + paddr_t kernend, paddr_t modsize)
> How about first_bank_can_fit_modules()? The name would be more descriptive.
Ack.
>
> > +{
> > const paddr_t kernsize = ROUNDUP(kernend, MB(2)) - kernbase;
> > - const paddr_t ram128mb = rambase + MB(128);
> >
> > - paddr_t modbase;
> > + /*
> > + * Check only the aggregate kernel/module footprint. The actual
> > DTB/initrd
> > + * location is selected by find_module_placement().
> I don't particularly like that we call dtb+initrd modules while kernel is
> also a
> module. How about renaming to find_dtb_initrd_placement()? This will improve
> the
> readability by a lot. We could also rename place_modules() to
> place_dtb_initrd_modules() or just place_dtb_initrd().
Ack.
>
> > + */
> > + return modsize + kernsize <= ramsize;
> > +}
> >
> > - if ( modsize + kernsize > ramsize )
> > - panic("Not enough memory in the first bank for the
> > kernel+dtb+initrd\n");
> > +static bool __init
> > +find_module_placement(paddr_t rambase, paddr_t ramsize,
> > + paddr_t kernbase, paddr_t kernend,
> > + paddr_t modsize, paddr_t *modbase)
> > +{
> > + const paddr_t ramend = rambase + ramsize;
> Instead of passing ramsize, pass ramend right away to avoid this line (similar
> to kernbase, kernend).
Ack.
>
> > + const paddr_t ram128mb = rambase + MB(128);
> >
> > /*
> > * DTB must be loaded such that it does not conflict with the
> > @@ -80,17 +120,49 @@ static void __init place_modules(struct kernel_info
> > *info,
> > * tools/libxc/xc_dom_arm.c:arch_setup_meminit as well.
> This is a stale path. Please update to tools/libs/guest/xg_dom_arm.c:meminit
Ack.
>
> > */
> > if ( ramend >= ram128mb + modsize && kernend < ram128mb )
> > - modbase = ram128mb;
> > - else if ( ramend - modsize > ROUNDUP(kernend, MB(2)) )
> > - modbase = ramend - modsize;
> > - else if ( kernbase - rambase > modsize )
> > - modbase = kernbase - modsize;
> > - else
> > {
> > - panic("Unable to find suitable location for dtb+initrd\n");
> > - return;
> > + *modbase = ram128mb;
> Do we need this extra variable? Can't we just *modbase = rambase + MB(128)?
Yes, we can. The variable is not needed for correctness. I kept it to
name the 128MB placement boundary and to make the condition and the
assignment use exactly the same value.
I can inline it in the next version if you prefer, although I think the
named boundary is a little clearer than:
if ( ramend >= rambase + MB(128) + modsize && kernend < rambase + MB(128) )
{
*modbase = rambase + MB(128);
>
> > + return true;
> > + }
> > +
> > + if ( ramend - modsize > ROUNDUP(kernend, MB(2)) )
> > + {
> > + *modbase = ramend - modsize;
> > + return true;
> > + }
> > +
> > + if ( kernbase - rambase > modsize )
> > + {
> > + *modbase = kernbase - modsize;
> > + return true;
> > }
> >
> > + return false;
> > +}
> > +
> > +static void __init place_modules(struct kernel_info *info,
> > + paddr_t kernbase, paddr_t kernend)
> > +{
> > + /* Align DTB and initrd size to 2Mb. Linux only requires 4 byte
> > alignment */
> > + const struct boot_module *mod = info->bd.initrd;
> > + const struct membanks *mem = kernel_info_get_mem(info);
> > + const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> > + const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt), MB(2));
> > + const paddr_t modsize = initrd_len + dtb_len;
> > +
> > + /* Convenient */
> > + const paddr_t rambase = mem->bank[0].start;
> > + const paddr_t ramsize = mem->bank[0].size;
> > +
> > + paddr_t modbase;
> > +
> > + if ( !first_bank_has_enough_room(ramsize, kernbase, kernend, modsize) )
> > + panic("Not enough memory in the first bank for the
> > kernel+dtb+initrd\n");
> > +
> > + if ( !find_module_placement(rambase, ramsize, kernbase, kernend,
> > modsize,
> > + &modbase) )
> > + panic("Unable to find suitable location for dtb+initrd\n");
> > +
> > info->dtb_paddr = modbase;
> > info->initrd_paddr = info->dtb_paddr + dtb_len;
> > }
> > @@ -100,32 +172,51 @@ static paddr_t __init kernel_zimage_place(struct
> > kernel_info *info)
> > const struct membanks *mem = kernel_info_get_mem(info);
> > paddr_t load_addr;
> >
> > -#ifdef CONFIG_HAS_DOMAIN_TYPE
> > - if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> > - return mem->bank[0].start + info->image.text_offset;
> > -#endif
> > + load_addr = kernel_zimage_place_in_bank(info, mem->bank[0].start,
> > + mem->bank[0].size);
> > + if ( load_addr == INVALID_PADDR )
> > + panic("Unable to find suitable location for the kernel\n");
> >
> > + return load_addr;
> > +}
> > +
> > +bool __init arch_hwdom_first_bank_ok(const struct kernel_info *info,
> > + paddr_t bank_start,
> > + paddr_t bank_size)
> > +{
> > + const struct boot_module *initrd = info->bd.initrd;
> > /*
> > - * If start is zero, the zImage is position independent, in this
> > - * case Documentation/arm/Booting recommends loading below 128MiB
> > - * and above 32MiB. Load it as high as possible within these
> > - * constraints, while also avoiding the DTB.
> > + * place_modules() rounds the DTB and initrd placement to 2MB
> > boundaries;
> > + * use the same granularity when checking whether the first bank can
> > hold
> > + * the boot modules.
> > */
> > - if ( info->image.start == 0 )
> > - {
> > - paddr_t load_end;
> > + const paddr_t initrd_len = ROUNDUP(initrd ? initrd->size : 0, MB(2));
> > + /*
> > + * The hardware domain FDT has not been generated yet. Use the
> > allocation
> > + * size as a conservative upper bound for the final DTB size.
> > + */
> > + const paddr_t dtb_len = ROUNDUP(hwdom_get_fdt_alloc_size(), MB(2));
> > + const paddr_t rambase = bank_start;
> > + const paddr_t ramsize = bank_size;
> > + const paddr_t modsize = initrd_len + dtb_len;
> > + const paddr_t ramend = rambase + ramsize;
> > + paddr_t kernbase;
> > + paddr_t kernend;
> > + paddr_t modbase;
> >
> > - load_end = mem->bank[0].start + mem->bank[0].size;
> > - load_end = MIN(mem->bank[0].start + MB(128), load_end);
> > + kernbase = kernel_zimage_place_in_bank(info, bank_start, bank_size);
> > + if ( kernbase == INVALID_PADDR ||
> > + info->image.len > INVALID_PADDR - kernbase )
> > + return false;
> >
> > - load_addr = load_end - info->image.len;
> > - /* Align to 2MB */
> > - load_addr &= ~((2 << 20) - 1);
> > - }
> > - else
> > - load_addr = info->image.start;
> > + kernend = kernbase + info->image.len;
> >
> > - return load_addr;
> > + if ( kernbase < rambase || kernend > ramend )
> > + return false;
> > +
> > + return first_bank_has_enough_room(ramsize, kernbase, kernend, modsize)
> > &&
> > + find_module_placement(rambase, ramsize, kernbase, kernend,
> > modsize,
> > + &modbase);
> > }
> >
> > static void __init kernel_zimage_load(struct kernel_info *info)
> > diff --git a/xen/common/device-tree/domain-build.c
> > b/xen/common/device-tree/domain-build.c
> > index 2a760b007b..25bc392fea 100644
> > --- a/xen/common/device-tree/domain-build.c
> > +++ b/xen/common/device-tree/domain-build.c
> > @@ -299,20 +299,30 @@ static bool __init allocate_hwdom_memory(struct
> > kernel_info *kinfo)
> >
> > for ( i = 0; (kinfo->unassigned_mem > 0) && (i < nr_banks); i++ )
> > {
> > - paddr_t bank_size;
> > + const paddr_t bank_start = hwdom_free_mem->bank[i].start;
> > + paddr_t bank_size = hwdom_free_mem->bank[i].size;
> > +
> > + /*
> > + * Check the size that would actually be assigned, not just the
> > size
> > + * of the host region.
> > + */
> > + bank_size = min(bank_size, kinfo->unassigned_mem);
> >
> > /*
> > * The first bank must be large enough for place_modules() to
> > * fit the kernel, DTB and initrd. Skip small regions to avoid
> > * ending up with a tiny first bank.
> > */
> > - if ( !mem->nr_banks && (hwdom_free_mem->bank[i].size <
> > min_bank_size) )
> > - continue;
> > + if ( !mem->nr_banks )
> > + {
> > + if ( bank_size < min_bank_size )
> > + continue;
> > +
> > + if ( !arch_hwdom_first_bank_ok(kinfo, bank_start, bank_size) )
> > + continue;
> > + }
> >
> > - bank_size = MIN(hwdom_free_mem->bank[i].size,
> > kinfo->unassigned_mem);
> > - if ( !allocate_bank_memory(kinfo,
> > -
> > gaddr_to_gfn(hwdom_free_mem->bank[i].start),
> > - bank_size) )
> > + if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(bank_start),
> > bank_size) )
> > {
> > xfree(hwdom_free_mem);
> > return false;
> > diff --git a/xen/include/xen/fdt-kernel.h b/xen/include/xen/fdt-kernel.h
> > index 00c37be101..86f2a69ede 100644
> > --- a/xen/include/xen/fdt-kernel.h
> > +++ b/xen/include/xen/fdt-kernel.h
> > @@ -93,6 +93,15 @@ kernel_info_get_mem_const(const struct kernel_info
> > *kinfo)
> > return container_of(&kinfo->mem.common, const struct membanks, common);
> > }
> >
> > +#ifndef arch_hwdom_first_bank_ok
> > +static inline bool
> > +arch_hwdom_first_bank_ok(const struct kernel_info *info, paddr_t
> > bank_start,
> > + paddr_t bank_size)
> > +{
> > + return true;
> > +}
> > +#endif
> > +
> > #ifndef KERNEL_INFO_SHM_MEM_INIT
> >
> > #ifdef CONFIG_STATIC_SHM
>
> I would prefer if this patch was split into refactoring (e.g. split
> place_modules() into functions later on used by patch 2) + hwdom fix (2
> patches). The first patch would also fit the rename I suggested.
Yes, that makes sense. I will split this in v2: first a
behavior-preserving refactoring patch, including the rename you
suggested, and then a second patch with the actual hwdom first-bank fix.
Best regards,
Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |