[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 2/3] xen/mm: Introduce NUMA-aware memory claim sets


  • To: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 25 Jun 2026 15:30:53 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 25 Jun 2026 13:31:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.05.2026 22:27, Bernhard Kaindl wrote:
> This commit extends Xen's memory claim design to support installing claim
> sets spanning multiple NUMA nodes atomically. As Roger Pau Monné suggested:
> 
>   Ideally, we would need to introduce a new hypercall that allows
>   making claims from multiple nodes in a single locked region, as to
>   ensure success or failure in an atomic way.
> 
> A claim set can contain multiple node-specific claims and a host-wide
> claim for memory that may come from any NUMA node. The new domctl
> installs the full claim set atomically, and the allocator is updated
> so that claim checks and claim consumption follow the new semantics.
> 
> This adds:
> 
> 1. installing multi-node claim sets atomically,
> 2. protecting claimed pages from other claim requests and allocations, and
> 3. redeeming held claims when satisfying allocations.
> 
> Legacy XENMEM_claim_pages behaviour is preserved; the interface is
> deprecated and superseded by XEN_DOMCTL_claim_memory.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
> 
> ---
> The v7 design document submitted ahead of this series may help with review.
> It explains the background, design rationale, and implementation details.
> 
> Rendered version: https://xen.kaindl.dev/claims-v7-design/designs/claims
> 
> Many thanks to everyone who contributed to the earlier work and review:
> especially Roger Pau Monné, Alejandro Vallejo, Jan Beulich, Andrew Cooper,
> Marcus Granado, and Edwin Török.
> 
> Thanks,
> Bernhard
> ---
>  tools/include/xenctrl.h             |  11 +
>  tools/libs/ctrl/xc_domain.c         |  28 ++
>  xen/common/domain.c                 |   5 +-
>  xen/common/domctl.c                 |  57 ++++
>  xen/common/memory.c                 |   5 +-
>  xen/common/page_alloc.c             | 410 +++++++++++++++++++++++-----
>  xen/include/public/domctl.h         |  38 +++
>  xen/include/public/memory.h         |   2 +
>  xen/include/xen/mm.h                |   6 +-
>  xen/include/xen/sched.h             |   4 +
>  xen/xsm/flask/hooks.c               |   1 +
>  xen/xsm/flask/policy/access_vectors |   1 +
>  12 files changed, 493 insertions(+), 75 deletions(-)

And there's no way to split this some? I'll only do some light review for now.

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1319,7 +1319,10 @@ int domain_kill(struct domain *d)
>          rspin_barrier(&d->domain_lock);
>          argo_destroy(d);
>          vnuma_destroy(d->vnuma);
> -        domain_set_outstanding_pages(d, 0);
> +        /* Release all outstanding claims of the domain. */
> +        domain_set_claim_entries(d, 1, &(xen_memory_claim_t){
> +            .target = XEN_DOMCTL_CLAIM_MEMORY_HOST, .pages = 0,
> +        });

This is pretty badly formatted. Maybe

        domain_set_claim_entries(
            d, 1,
            &(xen_memory_claim_t){
                .target = XEN_DOMCTL_CLAIM_MEMORY_HOST, .pages = 0,
            });

? (Applies elsewhere as well, obviously.)

Whether the .pages field needs explicitly mentioning is questionable.

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -51,6 +51,57 @@ static int xenctl_bitmap_to_nodemask(nodemask_t *nodemask,
>                                     MAX_NUMNODES);
>  }
>  
> +/* Set or get memory claims for a domain. */
> +static int claim_memory(struct domain *d,
> +                        struct xen_domctl_claim_memory *uinfo, bool 
> *copyback)
> +{
> +    xen_memory_claim_t *entries;
> +    int rc = -EFAULT;
> +
> +    /* Reject LLC coloring; alloc_color_heap_page() does not handle claims. 
> */
> +    if ( llc_coloring_enabled )
> +        return -EOPNOTSUPP;
> +
> +    switch ( uinfo->mode )
> +    {
> +    case XEN_DOMCTL_CLAIM_MEMORY_SET:
> +        if ( !uinfo->nr_entries )
> +            return -EINVAL;
> +        if ( uinfo->nr_entries > MAX_NUMNODES + 1 )

Why's MAX_NUMNODES and MAX_NUMNODES + 1 as an input permitted?

> +            return -E2BIG;
> +        break;
> +    case XEN_DOMCTL_CLAIM_MEMORY_GET:
> +        if ( uinfo->nr_entries > MAX_NUMNODES + 1 )
> +            uinfo->nr_entries = MAX_NUMNODES + 1;
> +        break;
> +    default:
> +        return -EOPNOTSUPP;
> +    }

Here and elsewhere - blank lines please between non-fall-through case blocks.

> +    if ( d->is_dying )
> +        return -ESRCH;
> +
> +    entries = xmalloc_array(xen_memory_claim_t, uinfo->nr_entries);

Better xzalloc_array() (really xvzalloc_array()) seeing ...

> +    if ( entries == NULL )
> +        return -ENOMEM;
> +
> +    switch ( uinfo->mode )
> +    {
> +    case XEN_DOMCTL_CLAIM_MEMORY_SET:
> +        if ( !copy_from_guest(entries, uinfo->claim_set, uinfo->nr_entries) )
> +            rc = domain_set_claim_entries(d, uinfo->nr_entries, entries);
> +        break;
> +    case XEN_DOMCTL_CLAIM_MEMORY_GET:
> +        rc = domain_get_claim_entries(d, &uinfo->nr_entries, entries);
> +        *copyback = true;
> +        if ( !rc && copy_to_guest(uinfo->claim_set, entries,
> +                                  uinfo->nr_entries) )

... this copy-out.

> +            rc = -EFAULT;
> +        break;
> +    }

Missing default:.

> @@ -865,6 +916,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> u_domctl)
>              copyback = true;
>          break;
>  
> +    case XEN_DOMCTL_claim_memory:
> +        ret = xsm_claim_pages(XSM_PRIV, d);

This and the xsm/flask/hooks.c change will need reworking, to fit the spirit
of XSA-492.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1276,6 +1276,42 @@ struct xen_domctl_get_domain_state {
>      uint64_t unique_id;      /* Unique domain identifier. */
>  };
>  
> +struct xen_memory_claim {
> +    uint64_aligned_t pages; /* Number of pages to claim. */
> +    uint32_t target;        /* NUMA node or special target constant. */
> +    uint32_t cmd;           /* Reserved, must be zero. */

Why is this named "cmd" when it's reserved?

> +};
> +typedef struct xen_memory_claim xen_memory_claim_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_memory_claim_t);
> +
> +/* Special claim targets for the target field of xen_memory_claim_t. */
> +#define XEN_DOMCTL_CLAIM_MEMORY_HOST     0x80000000U /* Host-wide claims. */
> +#define XEN_DOMCTL_CLAIM_MEMORY_LEGACY   0x40000000U /* Legacy semantics. */

I think the latter should be internal to Xen, and in particular not be
permitted with this new domctl.

> +/*
> + * XEN_DOMCTL_claim_memory
> + *
> + * Install or query a domain memory claim set. A SET operation replaces the
> + * existing claim set atomically. Claims are redeemed by later allocations to
> + * the domain. A SET request whose entries all have pages == 0 releases any
> + * existing claims.
> + *
> + * For GET, callers may pass nr_entries == 0 and claim_set == NULL to query 
> the
> + * number of records needed. Xen returns -ERANGE and updates nr_entries. If 
> the
> + * supplied array is too small, Xen returns -ERANGE and updates nr_entries
> + * without copying partial records.
> + */
> +struct xen_domctl_claim_memory {
> +    /* IN/OUT: Array of struct xen_memory_claim. */
> +    XEN_GUEST_HANDLE_64(xen_memory_claim_t) claim_set;
> +    /* IN/OUT: Number of records in the claim_set array. */
> +    uint32_t nr_entries;

What is IN and what is OUT looks to differ between GET and SET.

> +    /* IN: Operation to perform on the claim set (GET or SET). */
> +    uint32_t mode;
> +#define XEN_DOMCTL_CLAIM_MEMORY_SET 0U /* Set the claim set for the domain. 
> */
> +#define XEN_DOMCTL_CLAIM_MEMORY_GET 1U /* Get the claim set of the domain. */
> +};

And there's no way to fetch overall claims?

> @@ -131,7 +132,10 @@ int populate_pt_range(unsigned long virt, unsigned long 
> nr_mfns);
>  /* Claim handling */
>  unsigned long __must_check domain_adjust_tot_pages(struct domain *d,
>      long pages);
> -int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
> +int domain_set_claim_entries(struct domain *d, uint32_t nr_entries,
> +                             const struct xen_memory_claim *claim_set);
> +int domain_get_claim_entries(struct domain *d, uint32_t *nr_entries,
> +                             struct xen_memory_claim *claim_set);

Please see ./CODING_STYLE as to the use of fixed-width types. In the latter
case I can see justification, but in the former I can't.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -419,6 +419,10 @@ struct domain
>      unsigned int     xenheap_pages;     /* pages allocated from Xen heap */
>      /* Pages claimed but not possessed, protected by global heap_lock. */
>      unsigned int     outstanding_pages;
> +    unsigned int     node_claims;       /* Sum of per-node claims. */
> +    /* Domain objects use dedicated pages, leaving room for per-node claims. 
> */
> +    unsigned int     claims[MAX_NUMNODES]; /* Per-NUMA-node claims. */

Potentially too big an array to embed directly here.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.