[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.22] device-tree: validate first hwdom bank for boot modules


  • To: Mykola Kvach <xakep.amatop@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • Date: Tue, 2 Jun 2026 11:06:45 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=gmail.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PajO6ydWs3LKJXZ80LVKPJjXwpAaZvgCd1Lf3j6U6aU=; b=lrTUCplB78rYnqQgJOz4W29KunRQkJtLzrHpWLPCSyptnBZPI/OlFKCd59wmFQUBwtywiHiqER2lO3ctcn6xBkr9Lcy041Tv6KjXBCZ/ff8UxokpMYyJJ3lBCggv/GQRcZ38X0uwmNq7l8rsVMMiaccs911J0O/4DzFDFHzESp9/NvBRsNMFnovdKLd/O5X7S6jaUT9+uo5y0LYIG+GYZ4w420xk72CqRv9J5kBtN1e6eq+9i+bpn+/3GgNPoh2oe5evG04rSGyV4RViPwQzcET/Ej5E7FhKv+f8/2H/FKBI90NCngUsoqwYQbsYNTb53fY+Ql2O25HoPrujVgKyvA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Nc4lwAfGqKHcCQ1jph2bfqsnySFKz/G+ZyuyUwzcCALrG5wbQUTvnOUXvbWoBjAf/f4Hg3WadQGuHsy/kjKKVMgiDcdIiEUlVuivM9ryPusjABT0LpKRHCK6rpxo6efTNNEMYB3PKika6Xcs2E/hDm+VfEUhkrzgRluW5/gr5ddGRdTs5/PRbSL8U6bAeJz8xVo9HcK72ZlCvb5kqM1z1fyzdNEleiz6n8lGyZNpzEyxg+IDH5sN3fUajUPYT9zAqYkewzyY5D38gZ+6DJUlolBiGWuLOqOnEz87BmvmmPDvr/H9bdG18A90LZtCNfC2fccJlYxOzTfQWLtJ8f+MJg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=amd.com header.i="@amd.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Cc: Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 02 Jun 2026 09:07:00 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 29-May-26 17:10, Mykola Kvach wrote:
> From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> 
> With LLC coloring enabled, the hardware domain memory is allocated by
> allocate_hwdom_memory() rather than by using the fixed direct-map layout.
> 
> Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> DMA") made that allocator prefer lower host regions. The first-bank
> filter, however, still only checked the old 128MB heuristic. A low
> region can satisfy that heuristic but still be too small, or otherwise
> unsuitable, for the hardware-domain kernel and the DTB/initrd module
> area to fit in bank 0 according to the Arm placement rules.
> 
> Keep the existing first-bank size policy and add an architecture-specific
> candidate check. On Arm, compute the kernel load address for the candidate
> bank using the same logic as kernel_zimage_place(), verify that the kernel
> range is covered by that bank, and then reuse the same module-placement
> helper as place_modules(). The FDT is generated later, so use the
> hardware-domain FDT allocation size as a conservative upper bound for the
> final DTB size.
> 
> Check the candidate after capping the host region by the remaining
> unassigned hardware-domain memory, so the validation is performed against
> the size that would actually become bank 0.
> 
> This keeps the DMA-oriented allocation policy from de99f3263555 while
> preventing a too-small bank 0 from reaching place_modules().
> 
> Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> ---
> Changes since RFC:
> - Do not keep the RFC scalar minimum-size check. It can both reject
>   valid layouts and accept layouts which still fail later. Instead,
>   validate the candidate bank using the same kernel and module placement
>   rules as the load path.
>   Replace the scalar minimum-size check with arch_hwdom_first_bank_ok().
> - Reuse the existing Arm kernel and DTB/initrd placement rules for the
>   first-bank candidate check.
> - Treat the hardware-domain FDT allocation size as a conservative upper
>   bound because the final FDT is generated later.
> 
> Link to RFC:
>    
> https://patchew.org/Xen/9ae4f7dd49f5b1f761193adae573c2675c92e883.1779051035.git.mykola._5Fkvach@xxxxxxxx/
> 
> Why the RFC scalar approach was not kept:
> 
> A simple minimum-size check is not sufficient here because the validity of
> the first bank depends on the actual Arm placement rules, not only on the
> aggregate size of the kernel, DTB and initrd. The DTB/initrd area may fit
> before a 64-bit Image loaded with a text offset, while an AArch32
> position-independent kernel may leave no valid module location even when
> the aggregate size appears to fit. Fixed-address kernels also need the
> candidate bank start to be considered.
> 
> Link to synthetic tests output:
>    
> https://gitlab.com/xen-project/people/mykola_kvach/xen/-/blob/fix/hwdom-first-bank-dom0-modules-v2-new/tools/tests/arm-boot-modules/test-arm-boot-modules.log?ref_type=heads
> 
> ---
>  xen/arch/arm/acpi/domain_build.c        |   2 -
>  xen/arch/arm/domain_build.c             |   8 ++
>  xen/arch/arm/include/asm/domain_build.h |   4 +
>  xen/arch/arm/include/asm/kernel.h       |   9 ++
>  xen/arch/arm/kernel.c                   | 179 ++++++++++++++++++------
>  xen/common/device-tree/domain-build.c   |  24 +++-
>  xen/include/xen/fdt-kernel.h            |   9 ++
>  7 files changed, 182 insertions(+), 53 deletions(-)
> 
> diff --git a/xen/arch/arm/acpi/domain_build.c 
> b/xen/arch/arm/acpi/domain_build.c
> index 249d899c33..db16f7fa94 100644
> --- a/xen/arch/arm/acpi/domain_build.c
> +++ b/xen/arch/arm/acpi/domain_build.c
> @@ -26,8 +26,6 @@
>  #undef virt_to_mfn
>  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
>  
> -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> -
>  static int __init acpi_iomem_deny_access(struct domain *d)
>  {
>      acpi_status status;
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 1efddc60ef..550617f152 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const 
> char *e)
>                               (IS_ENABLED(CONFIG_STATIC_SHM) ?         \
>                                (NR_SHMEM_BANKS * (160 + 16)) : 0))
>  
> +paddr_t __init hwdom_get_fdt_alloc_size(void)
> +{
> +    if ( acpi_disabled )
> +        return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
> +
> +    return ACPI_DOM0_FDT_MIN_SIZE;
> +}
> +
>  unsigned int __init dom0_max_vcpus(void)
>  {
>      if ( opt_dom0_max_vcpus == 0 )
> diff --git a/xen/arch/arm/include/asm/domain_build.h 
> b/xen/arch/arm/include/asm/domain_build.h
> index df8b361b3d..85cf46a958 100644
> --- a/xen/arch/arm/include/asm/domain_build.h
> +++ b/xen/arch/arm/include/asm/domain_build.h
> @@ -19,6 +19,10 @@ int prepare_acpi(struct domain *d, struct kernel_info 
> *kinfo);
>  
>  int add_ext_regions(unsigned long s_gfn, unsigned long e_gfn, void *data);
>  
> +#define ACPI_DOM0_FDT_MIN_SIZE 4096
> +
> +paddr_t hwdom_get_fdt_alloc_size(void);
> +
>  #if defined(CONFIG_MPU) && defined(CONFIG_ARM_64)
>  /* Utility function to determine if an Armv8-R processor supports VMSA. */
>  bool has_v8r_vmsa_support(void);
> diff --git a/xen/arch/arm/include/asm/kernel.h 
> b/xen/arch/arm/include/asm/kernel.h
> index 21f4273fa1..bf14fb208a 100644
> --- a/xen/arch/arm/include/asm/kernel.h
> +++ b/xen/arch/arm/include/asm/kernel.h
> @@ -8,12 +8,21 @@
>  
>  #include <asm/domain.h>
>  
> +#include <xen/types.h>
> +
> +struct kernel_info;
> +
>  struct arch_kernel_info
>  {
>      /* Enable pl011 emulation */
>      bool vpl011;
>  };
>  
> +#define arch_hwdom_first_bank_ok arch_hwdom_first_bank_ok
> +bool arch_hwdom_first_bank_ok(const struct kernel_info *info,
> +                              paddr_t bank_start,
> +                              paddr_t bank_size);
> +
>  #endif /* #ifdef __ARCH_ARM_KERNEL_H__ */
>  
>  /*
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index b72585b7fe..907239a246 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -40,27 +40,67 @@ struct minimal_dtb_header {
>      /* There are other fields but we don't use them yet. */
>  };
>  
> -static void __init place_modules(struct kernel_info *info,
> -                                 paddr_t kernbase, paddr_t kernend)
> +static paddr_t __init
> +kernel_zimage_place_in_bank(const struct kernel_info *info,
> +                            paddr_t bank_start, paddr_t bank_size)
>  {
> -    /* Align DTB and initrd size to 2Mb. Linux only requires 4 byte 
> alignment */
> -    const struct boot_module *mod = info->bd.initrd;
> -    const struct membanks *mem = kernel_info_get_mem(info);
> -    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> -    const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt), MB(2));
> -    const paddr_t modsize = initrd_len + dtb_len;
> +    paddr_t load_addr;
>  
> -    /* Convenient */
> -    const paddr_t rambase = mem->bank[0].start;
> -    const paddr_t ramsize = mem->bank[0].size;
> -    const paddr_t ramend = rambase + ramsize;
> +#ifdef CONFIG_HAS_DOMAIN_TYPE
> +    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> +        return bank_start + info->image.text_offset;
> +#endif
> +
> +    /*
> +     * If start is zero, the zImage is position independent, in this
> +     * case Documentation/arm/Booting recommends loading below 128MiB
> +     * and above 32MiB. Load it as high as possible within these
> +     * constraints, while also avoiding the DTB.
> +     */
> +    if ( info->image.start == 0 )
> +    {
> +        paddr_t load_end;
> +        paddr_t ram128mb;
> +
> +        ram128mb = bank_start + MB(128);
> +        load_end = bank_start + bank_size;
> +        load_end = min(ram128mb, load_end);
> +
> +        if ( load_end - bank_start < info->image.len )
> +            return INVALID_PADDR;
> +
> +        load_addr = load_end - info->image.len;
> +        /* Align to 2MB */
> +        load_addr &= ~(MB(2) - 1);
> +        if ( load_addr < bank_start )
> +            return INVALID_PADDR;
> +    }
> +    else
> +        load_addr = info->image.start;
> +
> +    return load_addr;
> +}
> +
> +static bool __init
> +first_bank_has_enough_room(paddr_t ramsize, paddr_t kernbase,
> +                           paddr_t kernend, paddr_t modsize)
How about first_bank_can_fit_modules()? The name would be more descriptive.

> +{
>      const paddr_t kernsize = ROUNDUP(kernend, MB(2)) - kernbase;
> -    const paddr_t ram128mb = rambase + MB(128);
>  
> -    paddr_t modbase;
> +    /*
> +     * Check only the aggregate kernel/module footprint. The actual 
> DTB/initrd
> +     * location is selected by find_module_placement().
I don't particularly like that we call dtb+initrd modules while kernel is also a
module. How about renaming to find_dtb_initrd_placement()? This will improve the
readability by a lot. We could also rename place_modules() to
place_dtb_initrd_modules() or just place_dtb_initrd().

> +     */
> +    return modsize + kernsize <= ramsize;
> +}
>  
> -    if ( modsize + kernsize > ramsize )
> -        panic("Not enough memory in the first bank for the 
> kernel+dtb+initrd\n");
> +static bool __init
> +find_module_placement(paddr_t rambase, paddr_t ramsize,
> +                      paddr_t kernbase, paddr_t kernend,
> +                      paddr_t modsize, paddr_t *modbase)
> +{
> +    const paddr_t ramend = rambase + ramsize;
Instead of passing ramsize, pass ramend right away to avoid this line (similar
to kernbase, kernend).

> +    const paddr_t ram128mb = rambase + MB(128);
>  
>      /*
>       * DTB must be loaded such that it does not conflict with the
> @@ -80,17 +120,49 @@ static void __init place_modules(struct kernel_info 
> *info,
>       * tools/libxc/xc_dom_arm.c:arch_setup_meminit as well.
This is a stale path. Please update to tools/libs/guest/xg_dom_arm.c:meminit

>       */
>      if ( ramend >= ram128mb + modsize && kernend < ram128mb )
> -        modbase = ram128mb;
> -    else if ( ramend - modsize > ROUNDUP(kernend, MB(2)) )
> -        modbase = ramend - modsize;
> -    else if ( kernbase - rambase > modsize )
> -        modbase = kernbase - modsize;
> -    else
>      {
> -        panic("Unable to find suitable location for dtb+initrd\n");
> -        return;
> +        *modbase = ram128mb;
Do we need this extra variable? Can't we just *modbase = rambase + MB(128)?

> +        return true;
> +    }
> +
> +    if ( ramend - modsize > ROUNDUP(kernend, MB(2)) )
> +    {
> +        *modbase = ramend - modsize;
> +        return true;
> +    }
> +
> +    if ( kernbase - rambase > modsize )
> +    {
> +        *modbase = kernbase - modsize;
> +        return true;
>      }
>  
> +    return false;
> +}
> +
> +static void __init place_modules(struct kernel_info *info,
> +                                 paddr_t kernbase, paddr_t kernend)
> +{
> +    /* Align DTB and initrd size to 2Mb. Linux only requires 4 byte 
> alignment */
> +    const struct boot_module *mod = info->bd.initrd;
> +    const struct membanks *mem = kernel_info_get_mem(info);
> +    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> +    const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt), MB(2));
> +    const paddr_t modsize = initrd_len + dtb_len;
> +
> +    /* Convenient */
> +    const paddr_t rambase = mem->bank[0].start;
> +    const paddr_t ramsize = mem->bank[0].size;
> +
> +    paddr_t modbase;
> +
> +    if ( !first_bank_has_enough_room(ramsize, kernbase, kernend, modsize) )
> +        panic("Not enough memory in the first bank for the 
> kernel+dtb+initrd\n");
> +
> +    if ( !find_module_placement(rambase, ramsize, kernbase, kernend, modsize,
> +                                &modbase) )
> +        panic("Unable to find suitable location for dtb+initrd\n");
> +
>      info->dtb_paddr = modbase;
>      info->initrd_paddr = info->dtb_paddr + dtb_len;
>  }
> @@ -100,32 +172,51 @@ static paddr_t __init kernel_zimage_place(struct 
> kernel_info *info)
>      const struct membanks *mem = kernel_info_get_mem(info);
>      paddr_t load_addr;
>  
> -#ifdef CONFIG_HAS_DOMAIN_TYPE
> -    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> -        return mem->bank[0].start + info->image.text_offset;
> -#endif
> +    load_addr = kernel_zimage_place_in_bank(info, mem->bank[0].start,
> +                                            mem->bank[0].size);
> +    if ( load_addr == INVALID_PADDR )
> +        panic("Unable to find suitable location for the kernel\n");
>  
> +    return load_addr;
> +}
> +
> +bool __init arch_hwdom_first_bank_ok(const struct kernel_info *info,
> +                                     paddr_t bank_start,
> +                                     paddr_t bank_size)
> +{
> +    const struct boot_module *initrd = info->bd.initrd;
>      /*
> -     * If start is zero, the zImage is position independent, in this
> -     * case Documentation/arm/Booting recommends loading below 128MiB
> -     * and above 32MiB. Load it as high as possible within these
> -     * constraints, while also avoiding the DTB.
> +     * place_modules() rounds the DTB and initrd placement to 2MB boundaries;
> +     * use the same granularity when checking whether the first bank can hold
> +     * the boot modules.
>       */
> -    if ( info->image.start == 0 )
> -    {
> -        paddr_t load_end;
> +    const paddr_t initrd_len = ROUNDUP(initrd ? initrd->size : 0, MB(2));
> +    /*
> +     * The hardware domain FDT has not been generated yet. Use the allocation
> +     * size as a conservative upper bound for the final DTB size.
> +     */
> +    const paddr_t dtb_len = ROUNDUP(hwdom_get_fdt_alloc_size(), MB(2));
> +    const paddr_t rambase = bank_start;
> +    const paddr_t ramsize = bank_size;
> +    const paddr_t modsize = initrd_len + dtb_len;
> +    const paddr_t ramend = rambase + ramsize;
> +    paddr_t kernbase;
> +    paddr_t kernend;
> +    paddr_t modbase;
>  
> -        load_end = mem->bank[0].start + mem->bank[0].size;
> -        load_end = MIN(mem->bank[0].start + MB(128), load_end);
> +    kernbase = kernel_zimage_place_in_bank(info, bank_start, bank_size);
> +    if ( kernbase == INVALID_PADDR ||
> +         info->image.len > INVALID_PADDR - kernbase )
> +        return false;
>  
> -        load_addr = load_end - info->image.len;
> -        /* Align to 2MB */
> -        load_addr &= ~((2 << 20) - 1);
> -    }
> -    else
> -        load_addr = info->image.start;
> +    kernend = kernbase + info->image.len;
>  
> -    return load_addr;
> +    if ( kernbase < rambase || kernend > ramend )
> +        return false;
> +
> +    return first_bank_has_enough_room(ramsize, kernbase, kernend, modsize) &&
> +           find_module_placement(rambase, ramsize, kernbase, kernend, 
> modsize,
> +                                 &modbase);
>  }
>  
>  static void __init kernel_zimage_load(struct kernel_info *info)
> diff --git a/xen/common/device-tree/domain-build.c 
> b/xen/common/device-tree/domain-build.c
> index 2a760b007b..25bc392fea 100644
> --- a/xen/common/device-tree/domain-build.c
> +++ b/xen/common/device-tree/domain-build.c
> @@ -299,20 +299,30 @@ static bool __init allocate_hwdom_memory(struct 
> kernel_info *kinfo)
>  
>      for ( i = 0; (kinfo->unassigned_mem > 0) && (i < nr_banks); i++ )
>      {
> -        paddr_t bank_size;
> +        const paddr_t bank_start = hwdom_free_mem->bank[i].start;
> +        paddr_t bank_size = hwdom_free_mem->bank[i].size;
> +
> +        /*
> +         * Check the size that would actually be assigned, not just the size
> +         * of the host region.
> +         */
> +        bank_size = min(bank_size, kinfo->unassigned_mem);
>  
>          /*
>           * The first bank must be large enough for place_modules() to
>           * fit the kernel, DTB and initrd.  Skip small regions to avoid
>           * ending up with a tiny first bank.
>           */
> -        if ( !mem->nr_banks && (hwdom_free_mem->bank[i].size < 
> min_bank_size) )
> -            continue;
> +        if ( !mem->nr_banks )
> +        {
> +            if ( bank_size < min_bank_size )
> +                continue;
> +
> +            if ( !arch_hwdom_first_bank_ok(kinfo, bank_start, bank_size) )
> +                continue;
> +        }
>  
> -        bank_size = MIN(hwdom_free_mem->bank[i].size, kinfo->unassigned_mem);
> -        if ( !allocate_bank_memory(kinfo,
> -                                   
> gaddr_to_gfn(hwdom_free_mem->bank[i].start),
> -                                   bank_size) )
> +        if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(bank_start), 
> bank_size) )
>          {
>              xfree(hwdom_free_mem);
>              return false;
> diff --git a/xen/include/xen/fdt-kernel.h b/xen/include/xen/fdt-kernel.h
> index 00c37be101..86f2a69ede 100644
> --- a/xen/include/xen/fdt-kernel.h
> +++ b/xen/include/xen/fdt-kernel.h
> @@ -93,6 +93,15 @@ kernel_info_get_mem_const(const struct kernel_info *kinfo)
>      return container_of(&kinfo->mem.common, const struct membanks, common);
>  }
>  
> +#ifndef arch_hwdom_first_bank_ok
> +static inline bool
> +arch_hwdom_first_bank_ok(const struct kernel_info *info, paddr_t bank_start,
> +                         paddr_t bank_size)
> +{
> +    return true;
> +}
> +#endif
> +
>  #ifndef KERNEL_INFO_SHM_MEM_INIT
>  
>  #ifdef CONFIG_STATIC_SHM

I would prefer if this patch was split into refactoring (e.g. split
place_modules() into functions later on used by patch 2) + hwdom fix (2
patches). The first patch would also fit the rename I suggested.

~Michal




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.