[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] xen/mm: move adjustment of claimed pages counters on allocation


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 24 Dec 2025 22:31:25 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3DkZW9ANeNkDRudJ8QD3cLuqVFYFjj1HgGXb3KTV1as=; b=XPSqMRH+tF9MoMHlMwie8OGjJUYe5nM9TvgXrRczXJcm2l1Dul3BVkCDOzD6TsdoyqPdS7CxN6weLvjvtNvaMyenY1B2Ohn5cPPLQoibsTlIV+yWjaPts/mZMB2TsDq2/p5oBuwVr8JxON6xIKNDCp2Rgz9KixpCTFbMGe9mC1KNTVwvq4FvlzH0FO2Ju7MZR4uvsG5uo8/d1DQxyZPNY/BgNq7805+TTgd4FIRbbEwmYOeyN/RtExIXbRiFP96gRo0/4Jqnscv2alHN7IVrhStZf1TkgCarZHML4eVSMp3AgiZ/enQAsv7L4uONNJFL+IeJBt4eZWAnRY4/YojMVQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=QW7dXOLjF9ANr/B+dJ93CWI9Vvu6NLinurl3W6A/n/yI25w77/KSU5DLOq4fUf7f4+bVgwmTDwrOCpbzC9T+1/Rdph+bUk/Au+wnUQzuGcdzxdJbATBvwXo+xGJmZCMHJAS6H+Bnhd1ji68eJFdrgVM1o6vDHascqzpVyu4hwj4V2fKsiK3XnYRYj3M71O2SrYtcAofyFT/W7efON5nOCx/T2VPHnh0RVejMzw9YiYJZYorr+G4d90Rn+ldXRzy00M6khjRwydKNZofqxLruGPJTHitLYSYqVddMvGdW4QaqrHjPYNGbRRNkg/oFjyzIIK3GhFTeDMfE8KgpPtWKCg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Wed, 24 Dec 2025 22:31:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24/12/2025 7:40 pm, Roger Pau Monne wrote:
> The current logic splits the update of the amount of available memory in
> the system (total_avail_pages) and pending claims into two separately
> locked regions.  This leads to a window between counters adjustments where
> the result of total_avail_pages - outstanding_claims doesn't reflect the
> real amount of free memory available, and can return a negative value due
> to total_avail_pages having been updated ahead of outstanding_claims.
>
> Fix by adjusting outstanding_claims and d->outstanding_pages in the same
> place where total_avail_pages is updated.  Note that accesses to
> d->outstanding_pages is protected by the global heap_lock, just like
> total_avail_pages or outstanding_claims.  Add a comment to the field
> declaration, and also adjust the comment at the top of
> domain_set_outstanding_pages() to be clearer in that regard.
>
> Finally, due to claims being adjusted ahead of pages having been assigned
> to the domain, add logic to re-gain the claim in case assign_page() fails.
> Otherwise the page is freed and the claimed amount would be lost.
>
> Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory 
> ops)")
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Changes since v1:
>  - Regain the claim if allocated page cannot be assigned to the domain.
>  - Adjust comments regarding d->outstanding_pages being protected by the
>    heap_lock (instead of the d->page_alloc_lock).

This is a complicated patch, owing to the churn from adding extra
parameters.

I've had a go splitting this patch in half.  First to adjust the
parameters, and second the bugfix. 
https://gitlab.com/xen-project/hardware/xen-staging/-/commits/andrew/roger-claims

I think the result is nicer to follow.  Thoughts?

As to the logical part of the change, the extra parameters are ugly but
I can't see a better option.

> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 1f67b88a8933..b73f6e264514 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1071,6 +1049,28 @@ static struct page_info *alloc_heap_pages(
>      total_avail_pages -= request;
>      ASSERT(total_avail_pages >= 0);
>  
> +    if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> +    {
> +        /*
> +         * Adjust claims in the same locked region where total_avail_pages is
> +         * adjusted, not doing so would lead to a window where the amount of
> +         * free memory (avail - claimed) would be incorrect.
> +         *
> +         * Note that by adjusting the claimed amount here it's possible for
> +         * pages to fail to be assigned to the claiming domain while already
> +         * having been subtracted from d->outstanding_pages.  Such claimed
> +         * amount is then lost, as the pages that fail to be assigned to the
> +         * domain are freed without replenishing the claim.
> +         */
> +        unsigned int outstanding = min(d->outstanding_pages, request);
> +
> +        BUG_ON(outstanding > outstanding_claims);
> +        outstanding_claims -= outstanding;
> +        d->outstanding_pages -= outstanding;
> +        ASSERT(claimed);
> +        *claimed = outstanding;

Something that's not explicitly called out is that it is invalid to pass
claims=NULL for a non-NULL d.

There are only 3 callers, and two of them are updated in this way (the
3rd passing NULL for both), but it caught me a little off-guard.

In principle alloc_heap_pages() could return {pg, claimed} via a struct
which avoids this entanglement, but is unergonomic to express.

> @@ -1553,6 +1554,12 @@ static void free_heap_pages(
>  
>      avail[node][zone] += 1 << order;
>      total_avail_pages += 1 << order;
> +    if ( d && claim )
> +    {
> +        outstanding_claims += claim;
> +        d->outstanding_pages += claim;
> +    }

In the rework, I added a comment here:

+    if ( d && claim )
+    {
+        /*
+         * An allocation can consume part of a claim and subsequently
+         * fail.  In such a case, the claim needs re-crediting.
+         */
+        outstanding_claims += claim;
+        d->outstanding_pages += claim;
+    }


because it's very non-intuitive that `claim` is only non-zero in the
case of a failed domheap allocation.  Calling the parameter rebate (or
claim_rebate) would be clearer, but I'm not sure if it's a common enough
world to be terribly meaningful to non-native speakers.

It really is inconvenient that you can deplete the claim with part of a
single allocation.  The intended use doesn't care for this corner case;
you're supposed to claim what you need and have it drop exactly to zero
as construction progresses (anything else is a toolstack error).

However, taking such a simplification to the claims behaviour doesn't
help here AFAICT; you still need something to distinguish the rebate
case from others, so you can't drop the parameter entirely.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.