[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] device-tree: size first hwdom bank for boot modules


  • To: dmukhin@xxxxxxxx
  • From: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • Date: Mon, 25 May 2026 19:31:33 +0300
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=bdjHvJti1PFGKhzoVezSMAeyJZDGbVl6BB/kPZLIlDY=; fh=9JZxiMpSfuLqiQWy+/Qm7fbwQ+MYsT4GTch2ySqG6FU=; b=RDg8EY3RqMTzqcCNj529l6Nq0K8Q0/+NNRkq8S0X8GvVzgHjhENRd4dj2Le1Ye2gz8 x9haK7DQds0t8M0Ks48cZCIQgII788RgeCR7jj3Z2mSHE6epgsN4b++M7TxurikRuViW XTsiYP9nXW9yarYjKvw3t9NAxa81TSqcUJmfOp0pYiR23wDzWsKu7wQSi/GhyIDT2Vtp N/H3FCT3h5eELk8/UZgPx/O/9M4R7LMBLmC6RXZNjlG2qjs5xk51A0/Q0VtIZNhE2W2n tTUIodRTRHS2QieB4OOQExLHkUfH5kOOjj2bCZcFvXG4mYRw1BXNKhRYW37+a11jKztN hr+A==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1779726706; cv=none; d=google.com; s=arc-20240605; b=TW74qCTXuNcoEDs+JwjELR+Rmu+P+pjMdnxYa05fTiWEIDCBK5Pjn7kJ/rDltv3jRG JBxjZ/twu3Jyvo3gNtDpVE9PfVLPDw3wAt0yLNMdBaSV8VWV5wFHFMWqS+6yEaGn8xuK +lQAAn92Idk68FfggzFiaDE75NlQ6bfhGSBDwARZ7qj1ZsOthJqrqKO5IH0fz4zOEDAm VD3d/YsXXxlV1HcHHy/Y+UocmbIiKzuzS1O/jclu+LyFB8Jbn/Iq6zte/2xBxC8jlnMU fTQkA94r9cyg/A9+1vSydtEe2VAN22IOsyakloHx91IiTETf/qAgkeHCSh5xtNhRgRzJ cF0Q==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 25 May 2026 16:32:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, May 22, 2026 at 1:10 AM <dmukhin@xxxxxxxx> wrote:
>
> Hi Mykola,
>
> The patch looks good!
>
> I would try to add a CI coverage for QEMU aarch64 tests, since QEMU
> supports multiple RAM banks topology.
>
> What do you think?
>
> Also, few remarks below.
>
> On Sun, May 17, 2026 at 11:57:56PM +0300, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > With LLC coloring enabled, the hardware domain memory comes from
> > allocate_hwdom_memory(), not from the fixed direct-map banks used when
> > coloring is off.
> >
> > Commit de99f3263555 ("device-tree: Improve hwdom memory allocation for
> > DMA") made that allocator sort free host regions by ascending address so
> > Dom0 gets DMA-capable low memory first. The first bank filter still only
> > required 128MB. That can select a low region which is large enough for
> > the heuristic, but not large enough for place_modules() to put the Dom0
> > kernel, generated DTB and initrd contiguously in bank 0.
> >
> > Ask arch code for any additional first-bank size requirement. On Arm,
> > compute it from the actual Dom0 kernel placement, rounded initrd size and
> > generated DTB size hint. For 64-bit Image kernels, include the text offset
> > from the candidate bank start, because the returned requirement is compared
> > with a bank size measured from that start. The hint covers both the normal
> > Device Tree path and the minimal DTB created for ACPI boot.
> >
> > Check the first-bank threshold against the size which will actually be
> > assigned to Dom0, after capping the host region by the remaining unassigned
> > Dom0 memory. Otherwise a large host region could pass the test but still
> > produce a first guest bank too small for place_modules().
> >
> > Use the typed min()/max() helpers for this normal allocation arithmetic;
> > MIN()/MAX() are intended for preprocessor-style contexts and skip the type
> > checking provided by the lowercase helpers.
> >
> > This keeps the DMA-oriented allocation policy from de99f3263555 while
> > preventing a too-small bank 0 from reaching place_modules().
> >
> > Fixes: de99f3263555 ("device-tree: Improve hwdom memory allocation for DMA")
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > Test/setup notes:
> >
> > The failure was reproduced on a Renesas H3ULCB/R-Car H3 (r8a7795)
> > arm64 board booted through U-Boot/TFTP and using huge initrd.
> >
> > Relevant Xen command line excerpt:
> >   dom0_mem=2048M llc-coloring=on
> >
> > Boot module layout from Xen:
> >   MODULE[2]: 0x0000000084000040-0x000000008e75d92f Ramdisk
> >   MODULE[3]: 0x00000000a0000000-0x00000000a3ffffff Kernel
> >   MODULE[4]: 0x00000000a4000000-0x00000000a400ffff XSM Policy
> >
> > The initrd is about 168MB. With LLC coloring enabled and the low-address
> > allocation policy from de99f3263555, Dom0 can receive a 192MB first bank:
> >   d0 BANK[0] 0x00000048000000-0x00000054000000 (192MB)
> >
> > That bank satisfies the old 128MB minimum but is too small for the
> > rounded Dom0 kernel, generated DTB and initrd placement. The observed
> > failure before this patch was:
> >   Panic on CPU 0:
> >   Not enough memory in the first bank for the kernel+dtb+initrd
> >
> > With this patch, the same boot skips the too-small low region for bank 0
> > and reaches Dom0:
> >   d0 BANK[0] 0x00000057000000-0x00000084000000 (720MB)
> >   d0 BANK[1] 0x0000008e800000-0x000000c0000000 (792MB)
> >   d0 BANK[2] 0x00000500000000-0x00000521800000 (536MB)
> >   d0: extended region 0: 0x48000000->0x54000000
> >   Loading zImage from 0x00000000a0000000 to 0x57000000-0x5b000000
> >   Loading d0 initrd from 0x0000000084000040 to 0x5f200000-0x6995d8f0
> >   Loading d0 DTB to 0x5f000000-0x5f011c80
> >   Linux version 5.10.194-yocto-standard
> > ---
> >  xen/arch/arm/acpi/domain_build.c        |  2 --
> >  xen/arch/arm/domain_build.c             |  8 ++++++
> >  xen/arch/arm/include/asm/domain_build.h |  4 +++
> >  xen/arch/arm/include/asm/kernel.h       |  8 ++++++
> >  xen/arch/arm/kernel.c                   | 35 +++++++++++++++++++++++++
> >  xen/common/device-tree/domain-build.c   | 27 ++++++++++++++-----
> >  xen/include/xen/fdt-kernel.h            |  8 ++++++
> >  7 files changed, 83 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/acpi/domain_build.c 
> > b/xen/arch/arm/acpi/domain_build.c
> > index 249d899c33..db16f7fa94 100644
> > --- a/xen/arch/arm/acpi/domain_build.c
> > +++ b/xen/arch/arm/acpi/domain_build.c
> > @@ -26,8 +26,6 @@
> >  #undef virt_to_mfn
> >  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> >
> > -#define ACPI_DOM0_FDT_MIN_SIZE 4096
> > -
> >  static int __init acpi_iomem_deny_access(struct domain *d)
> >  {
> >      acpi_status status;
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 1efddc60ef..226e053c68 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -115,6 +115,14 @@ int __init parse_arch_dom0_param(const char *s, const 
> > char *e)
> >                               (IS_ENABLED(CONFIG_STATIC_SHM) ?         \
> >                                (NR_SHMEM_BANKS * (160 + 16)) : 0))
> >
> > +paddr_t __init dom0_get_fdt_size_hint(void)
> > +{
> > +    if ( !acpi_disabled )
> > +        return ACPI_DOM0_FDT_MIN_SIZE;
> > +
> > +    return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
>
> I would invert the condition so it is read more straightforward:
>
>     if ( acpi_disabled )
>         return fdt_totalsize(device_tree_flattened) + DOM0_FDT_EXTRA_SIZE;
>
>     return ACPI_DOM0_FDT_MIN_SIZE;
>
> [..]
> > diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> > index b72585b7fe..3644663e2f 100644
> > --- a/xen/arch/arm/kernel.c
> > +++ b/xen/arch/arm/kernel.c
> > @@ -128,6 +128,41 @@ static paddr_t __init kernel_zimage_place(struct 
> > kernel_info *info)
> >      return load_addr;
> >  }
> >
> > +static paddr_t __init kernel_placement_size(paddr_t load_addr, paddr_t len)
> > +{
> > +    return ROUNDUP(load_addr + len, MB(2)) - load_addr;
> > +}
> > +
> > +paddr_t __init arch_get_min_first_bank_size(struct kernel_info *info,
> > +                                            paddr_t bank_start)
> > +{
> > +    const struct boot_module *mod = info->bd.initrd;
> > +    const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0, MB(2));
> > +    const paddr_t dtb_len = ROUNDUP(dom0_get_fdt_size_hint(), MB(2));
> > +    paddr_t kernsize;
> > +
> > +#ifdef CONFIG_HAS_DOMAIN_TYPE
>
> Perhaps use `IS_ENABLED(CONFIG_HAS_DOMAIN_TYPE)` to reduce ifdefery?
> My impression that IS_ENABLED() is preferred.
>
> > +    if ( (info->type == DOMAIN_64BIT) && (info->image.start == 0) )
> > +    {
> > +        paddr_t load_addr = bank_start + info->image.text_offset;
> > +
> > +        /*
> > +         * The caller compares this value with a size measured from
> > +         * bank_start, so include the text offset before the kernel.
> > +         */
> > +        kernsize = ROUNDUP(load_addr + info->image.len, MB(2)) - 
> > bank_start;
> > +        return kernsize + initrd_len + dtb_len;
> > +    }
> > +#endif
> > +
> > +    if ( info->image.start == 0 )
>
> Here too: invert the condition?
>
>        if ( info->image.start )
>            kernsize = kernel_placement_size(info->image.start, 
> info->image.len);
>        else
>            ...

I think I will keep the explicit info->image.start == 0 check here.

In this code, start == 0 is an established special case rather than just
a numeric input to the helper. kernel_zimage_place() handles it
explicitly as the position-independent image case where Xen chooses the
load address. Keeping the branch in the same form matches the surrounding
code and keeps that special case visible.

Best regards,
Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.