|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 05/11] xen: Create per-node outstanding claims
On Fri, Mar 14, 2025 at 05:24:56PM +0000, Alejandro Vallejo wrote:
> Extends domain_set_outstanding_claims() to allow staking claims on an
> exact node. Also creates global per-node claim counts analogous to
> `outstanding_claims`. Note that the per-node counts can't replace the
> global one if we want exact-node claims to coexist with non-exact
> claims.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
> ---
> xen/common/page_alloc.c | 32 +++++++++++++++++++++++++++++++-
> xen/include/xen/sched.h | 3 +++
> 2 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 9243c4f51370..7fe574b29407 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -490,6 +490,7 @@ static unsigned long pernode_avail_pages[MAX_NUMNODES];
>
> static DEFINE_SPINLOCK(heap_lock);
> static long outstanding_claims; /* total outstanding claims by all domains */
> +static unsigned long pernode_oc[MAX_NUMNODES]; /* per-node outstanding
> claims */
>
> unsigned long domain_adjust_tot_pages(struct domain *d, nodeid_t node,
> long pages)
> @@ -501,20 +502,31 @@ unsigned long domain_adjust_tot_pages(struct domain *d,
> nodeid_t node,
> * can test d->outstanding_pages race-free because it can only change
> * if d->page_alloc_lock and heap_lock are both held, see also
> * domain_set_outstanding_pages below
> + *
> + * If `d` has an exact-node claim, we must exit early if this is an
> + * adjustment attributed to another node.
> */
> - if ( !d->outstanding_pages || pages <= 0 )
> + if ( !d->outstanding_pages || pages <= 0 ||
> + (d->claim_node != NUMA_NO_NODE && d->claim_node != node) )
> goto out;
>
> +
> spin_lock(&heap_lock);
> BUG_ON(outstanding_claims < d->outstanding_pages);
> if ( d->outstanding_pages < pages )
> {
> /* `pages` exceeds the domain's outstanding count. Zero it out. */
> + if ( d->claim_node != NUMA_NO_NODE )
> + pernode_oc[d->claim_node] -= d->outstanding_pages;
> +
> outstanding_claims -= d->outstanding_pages;
> d->outstanding_pages = 0;
> }
> else
> {
> + if ( d->claim_node != NUMA_NO_NODE )
> + pernode_oc[d->claim_node] -= pages;
> +
> outstanding_claims -= pages;
> d->outstanding_pages -= pages;
> }
> @@ -542,6 +554,10 @@ int domain_set_outstanding_pages(struct domain *d,
> nodeid_t node,
> if ( pages == 0 )
> {
> outstanding_claims -= d->outstanding_pages;
> +
> + if ( d->claim_node != NUMA_NO_NODE )
> + pernode_oc[d->claim_node] -= d->outstanding_pages;
> +
> d->outstanding_pages = 0;
> ret = 0;
> goto out;
> @@ -564,12 +580,26 @@ int domain_set_outstanding_pages(struct domain *d,
> nodeid_t node,
> /* how much memory is available? */
> avail_pages = total_avail_pages - outstanding_claims;
>
> + /* This check can't be skipped for the NUMA case, or we may overclaim */
> if ( pages > avail_pages )
> goto out;
>
> + if ( node != NUMA_NO_NODE )
> + {
> + avail_pages = pernode_avail_pages[node] - pernode_oc[node];
> +
> + if ( pages > avail_pages )
> + goto out;
> + }
> +
> /* yay, claim fits in available memory, stake the claim, success! */
> d->outstanding_pages = pages;
> outstanding_claims += d->outstanding_pages;
> + d->claim_node = node;
> +
> + if ( node != NUMA_NO_NODE )
> + pernode_oc[node] += pages;
> +
> ret = 0;
>
> out:
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 559d201e0c7e..307a9d749f5d 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -406,6 +406,9 @@ struct domain
> unsigned int max_pages; /* maximum value for
> domain_tot_pages() */
> unsigned int extra_pages; /* pages not included in
> domain_tot_pages() */
>
> + /* NUMA node from which outstanding pages have been reserved */
> + unsigned int claim_node;
This should possibly be nodeid_t rather than unsigned int?
But why is this a single node? The interface should allow for a
domain to claim memory from multiple different nodes.
The interface here seems to be focused on domains only being allowed
to allocate from a single node, or otherwise you must first allocate
memory from a node before moving to the next one (which defeats the
purpose of claims?).
I think we want to instead convert d->outstanding_pages into a
per-node array, so that a domain can have outstanding claims for
multiple NUMA nodes?
The hypercall interface becomes a bit awkward then, as the toolstack
has to perform a different hypercall for each memory claim from a
different node (and rollback in case of failure). Ideally we would
need to introduce a new hypercall that allows making claims from
multiple nodes in a single locked region, as to ensure success or
failure in an atomic way.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |