[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 25 Mar 2026 17:26:01 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OxRHVU4Ie4B2eNaBFNdJ9gSoN4f0appxL4CDJDX7deQ=; b=Ypa0xvqAcg1WJqnPTOdWHE+gd1Apm/TchzFu0pXGas2kKVYBfaxcGOV0jB1zqGEoA38ym9DjTakdpVOSln5rGXM9pe1UWH1hvGlt8S5mUZtpDbHUb8DpJPkhobPKhb3EVndzqguqEkAwJ0CAPFcsM6QmcGf7PZiFMsu1GVk11OfTVArue1/a15xClWMHLLawq9n8gkd4YMSzcsPe7j5eH0OIsqGxUemAYPEMDk2VnjbV08sstotl/OtR2NKxccvwCZ/EMT/GPf+EIiAxuaDNbmWKvOy8GKMmI2E793Cc6h0G8u9r/+2Tkwe9SH8PL+Nf6BQBBSv9b1knE5ieEdq5Cg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=rZOI9rsADKKeNl1q1ZepXYELgTn4C1MgFBMY/QgPKWeD3recIt2WLzjgW7EvDbbAL/g0sXZj5MhWSKCXgmWVH/5IhpB2+CxF3ZJvIoBdZMFpRT/V2/esruMf6L9CiEQZD1T9WrdfZFcaEz/gdZ6Vo6LzjOOYzth37EwFmi38BufIasLr/GFd9LnLIljul6prgpKwMwOEQ3FQvmnhVh1Shjg2TOOge6szhlkXFikIEcPtjHHGIUBm3p/OOTsAj2ZnbxqnXQTeufBI/sEfC8X7FzZKadm6qb3tbfcI0XjNlsd2GjtMtm/KCbgkwtSjRhEKtQFO27M6rpS+Zi9djcI4+A==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 25 Mar 2026 16:26:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
> On 25.03.2026 11:08, Roger Pau Monne wrote:
> > ---
> > I've attempted various different ways to solve this, but they all ended up
> > being impossible.
> > 
> >  * Prevent non-scrubbed pages from getting extra refcounts (iow: make
> >    get_page() fail for them).  This seemed nice, but the cleanup using
> >    put_page_alloc_ref() was impossible as non-scrubbed pages would return
> >    failure in get_page(), and so I couldn't take the extra reference ahead
> >    of calling put_page_alloc_ref().
> 
> A special-case variant of get_page() could be introduced, but maybe that
> would still be overly fragile.

It seemed too much complexity (and risk), just to deal with this
scenario.

> When we discussed this, what I had proposed didn't require use of get_page()
> though. assign_pages() would install two general references (plus one type
> ref for PGT_writable) in this special case. To free, you'd call
> put_page_alloc_ref() followed by put_page_and_type().

Doesn't that risk under flowing the page counter if there's a parallel
call to decrease_reservation() against this MFN before?

How would the freeing done in populate_physmap() (in case of
concurrent calls) know whether already scrubbed pages have had it's
PGC_allocated bit dropped?

> That said, the patch here is still less intrusive than I feared, so I'm not
> asking to re-work this again.

Thanks.

> >  * Disallow XENMEM_decrease_reservation until the domain has finished
> >    creation would fix the issue of pages being freed while pending scrub,
> >    but it's not clear there might be other usages that would be problematic,
> >    as get_page() on non-scrubbed pages would still return success.
> 
> I agree this is of concern.
> 
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
> >                              goto out;
> >                          }
> >                      }
> > +
> > +                    if ( assign_page(page, a->extent_order, d, memflags) )
> > +                    {
> > +                        free_domheap_pages(page, a->extent_order);
> 
> The pages don't have an owner set yet, so that function will go straight
> to free_heap_pages(), needlessly passing "true" as last argument. Correct,
> but (for large pages, which the stashing is about) highly inefficient.

My bad, I was sure I was using the same freeing function as
alloc_domheap_pages() on failure to assign, but I clearly wasn't.  I
will switch to using free_heap_pages().

> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2699,7 +2699,13 @@ struct page_info *alloc_domheap_pages(
> >                                    memflags, d)) == NULL)) )
> >           return NULL;
> >  
> > -    if ( d && !(memflags & MEMF_no_owner) )
> > +    /*
> > +     * Don't add pages with the PGC_need_scrub bit set to the domain, the
> > +     * caller must clean the bit and then manually call assign_pages().
> > +     * Otherwise pages with the PGC_need_scrub would be reachable using
> > +     * get_page().
> > +     */
> 
> How about replacing the latter "with the PGC_need_scrub" by "still subject
> to scrubbing"?

Sure.

> > +    if ( d && !(memflags & MEMF_no_owner) && !(memflags & MEMF_keep_scrub) 
> > )
> >      {
> >          if ( memflags & MEMF_no_refcount )
> >          {
> 
> This no-refcount code isn't repeated at the new call site of assign_page().
> It's not needed there, yes, but wouldn't we better allow this to be taken
> care of right here, moving the MEMF_keep_scrub check immediately ahead of
> the call to assign_page()?
> 
> Otherwise should we reject (much earlier) MEMF_no_refcount used together
> with MEMF_keep_scrub?

hm, I was likely too focus on the specific use-case of
populate_physmap(), which is the only user of MEMF_keep_scrub.  I've
noticed the MEMF_no_refcount bits here, but since populate_physmap()
never uses that flag I just left it as-is.

Will see about adjusting it.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.