[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0


  • To: Jason Andryuk <jason.andryuk@xxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 3 Jun 2026 16:16:47 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gMqwwI4Xx1eopbTzifhcDbu7vCOUBDHJO3sykFHN+H8=; b=YUiTolb81irzLLPv1kuHz9QiEG7D2lYYdnMf8NkNnZ04qnnrBwqVnSb/qsuZgQKtkJ+rDUjpS17paYOwxsfEZbZhMNM2IAut6NR9FanC0NGzCMHko4BQlldCvSJOeQpSFYVTfvy+QfSzTAohG0x5wd6PTI2E000l1uaSEOF/KvMITB93VHSkqONeNRYR+soU9NH1pRl8zVsSfAwZLgvHy5w/474pgcOTNQHyJMSOFTFicy3DN8aCERNXwQGSuWHYbmSVC+DGQeB7rtPMoVDA+KxhrpCXsdFxfVal5XL4j7pzHRQCD0P8teH8psChKEcsRZ/+9pxuYKEKTzp6Ljgcow==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=w48plQxxsIBr9bgZQx36C2i8j09fM7XpiLzJorqx/qIfIvtGkPCdS2D76rJ8qsKk8MqkbFsBvEuTzRTt8yNbPspFc6+2b4x2+EtB1OT2MVHJ3Bftw/IoeyGVGKrvSWZIN8tAImbRkIprS8OZXRWptDI9S+N/Ve2OtyEQKNIRMU7N4NvS2SB/GQ1YJUsJu8/keH5tpYibQRrpRuRPzNuNtPQr5K5DoREVKrkDJ8trYpCn/gc+sBh8RbObaVTIzAIgWaGy95op0F+HgZ29qpByBXWiNW3fJHkPPwcbsJa6dgGILLzC+ohAK8CCfqpeG1MfQ2yV9Ke5KkYSZtlM2osXCg==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>
  • Delivery-date: Wed, 03 Jun 2026 14:16:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 03, 2026 at 10:03:53AM -0400, Jason Andryuk wrote:
> On 2026-06-02 03:01, Roger Pau Monné wrote:
> > On Mon, Jun 01, 2026 at 05:07:52PM -0400, Jason Andryuk wrote:
> > > On 2026-06-01 13:00, Roger Pau Monné wrote:
> > > > On Tue, May 26, 2026 at 04:31:14PM -0400, Jason Andryuk wrote:
> > > > > A single stubs page is initialized with 0xcc and re-used, with 
> > > > > multiple
> > > > > CPUs each using a portion of the shared page.  In cpu_smpboot_free(),
> > > > > each stubs area is checked against 0xcc.  When all are set to 0xcc, 
> > > > > the
> > > > > page is freed.
> > > > > 
> > > > > Booting a system with smt=0, CPU0 is initially setup, allocating the
> > > > > stubs page and initializing to 0xcc.  When more CPUs are brought up,
> > > > > CPU1 is initialized and then immediately brough offline as it is the
> > > > > sibling of CPU0.  Since the page was initially memset with 0xcc,
> > > > > cpu_smpboot_free() finds all stubs as 0xcc and frees the page.
> > > > > However, the page is still assigned to CPU0 and continues to be 
> > > > > assigned
> > > > > to other CPUs.
> > > > > 
> > > > > Meanwhile the page can be reallocated, which can lead to misbehavior.
> > > > > The particular instance was the stubs page re-used as a page table 
> > > > > which
> > > > > later faulted when the entry was all 0xcc.
> > > > > 
> > > > > Change to initializing the page as 0xd6/STUB_BUF_FREE, and 
> > > > > initializing
> > > > > individual stubs as 0xcc/STUB_BUF_USED.  0xd6 now indicates unused, 
> > > > > and
> > > > > 0xcc indicates used/assigned.  When freeing a CPU, the stub is set to
> > > > > 0xd6, and the page is freed if all stubs are 0xd6.  Initializing with
> > > > > STUB_BUF_FREE lets cpu_smpboot_free() a page that was only ever
> > > > > partially used.
> > > > > 
> > > > > 0xd6/UDB is a 1 byte invalid opcode, which is similar to the existing
> > > > > use of 0xcc.  0xd6 is used to identify bug frames, but the stub addr
> > > > > (e.g. 0xffff82d07fffe000) fails the is_active_kernel_text() check.  It
> > > > > should be okay to use here.
> > > > > 
> > > > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> > > > > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
> > > > > ---
> > > > > It would be nice to use get_page()/put_page() to let count_info handle
> > > > > reference counting, but they require an owning domain.
> > > > > 
> > > > > The listed Fixes introduced the use of 0xcc, but the smt commit may 
> > > > > have
> > > > > made it more problematic.
> > > > > Fixes: d8f974f1a646 ("x86: command line option to avoid use of 
> > > > > secondary hyper-threads")
> > > > 
> > > > Speaking with Andrew, we believe it might be easier to simply forego
> > > > the freeing of the page, possibly something like:
> > > > 
> > > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> > > > index ff05955bae40..62c6cbf4b561 100644
> > > > --- a/xen/arch/x86/smpboot.c
> > > > +++ b/xen/arch/x86/smpboot.c
> > > > @@ -990,19 +990,12 @@ static void cpu_smpboot_free(unsigned int cpu, 
> > > > bool remove)
> > > >        {
> > > >            mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
> > > >            unsigned char *stub_page = map_domain_page(mfn);
> > > > -        unsigned int i;
> > > >            memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc, 
> > > > STUB_BUF_SIZE);
> > > > -        for ( i = 0; i < STUBS_PER_PAGE; ++i )
> > > > -            if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
> > > > -                break;
> > > >            unmap_domain_page(stub_page);
> > > >            destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
> > > >                                 (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) 
> > > > + 1);
> > > >            per_cpu(stubs.addr, cpu) = 0;
> > > > -        per_cpu(stubs.mfn, cpu) = 0;
> > > > -        if ( i == STUBS_PER_PAGE )
> > > > -            free_domheap_page(mfn_to_page(mfn));
> > > >        }
> > > >        if ( IS_ENABLED(CONFIG_PV32) )
> > 
> > I think I've made an oversight in the code above: if all 32 CPUs
> > sharing the same stubs page are offlined, the reference to the stubs
> > page is possibly lost (if CPUs are not parked) and a new stubs page
> > would be allocated if any of those CPUs is brought back online, thus
> > leaking the previous allocation.  The simplest way to solve this would
> > be to introduce an array that indexes the stub pages, and replace the
> > logic in cpu_smpboot_alloc() that figures out whether stubs.mfn is set
> > for adjacent CPUs.
> 
> Right, but I thought Andrew's point was that offlining 32 CPUs is
> unrealistic, so don't even bother tracking.  If CPUs are offlined (and you
> somehow keep running), you can leak the page.

I thin we should avoid freeing the page and ensure it's always reused,
rather than possibly leaking it.  It's also possible there's a single
trailing CPU using the last stubs page alone, and offlining and
onlining it would trigger such a page leak, without requiring a block
of 32 CPUs going offline.

Entering an ACPI sleep state causes all APs to be offlined (see the
disable_nonboot_cpus() call in enter_state()), and it would be
undesirable that putting a system to sleep causes page leaking.

> > > > (there might be further cleanup possible if the page is not freed, the
> > > > above chunk is untested).
> > > > 
> > > > It's a single page shared between 32 CPUs, and offlining 32 adjacent
> > > > CPUs seems very unlikely.  IMO the extra complexity of having to deal
> > > > with the freeing overshadows the very small memory gain we get from
> > > > it.
> > > 
> > > Hi Roger,
> > > 
> > > Yes, I made and tested the same change locally last week.  Well, I 
> > > retained:
> > >       per_cpu(stubs.mfn, cpu) = 0;
> > > 
> > > Maybe it would be good to save the mfn in case the CPU returns?  But I
> > > thought per-cpu vars are cleared, so it wouldn't be available anyway?
> > 
> > Depends on whether the CPUs are parked or not (see park_offline_cpus).
> > I think leaving stubs.mfn is fine, in the parked case we avoid part of
> > the setup logic by already having the mfn cached (no big deal either
> > way).
> 
> Right.
> 
> > > Also, I was waiting to see if anyone chimed in with other ideas.
> > 
> > Maybe you could assign the page to dom_xen and then use
> > {get,put}_page(), but again it seems overly complicated.
> 
> Code-wise this doesn't look bad, but it blows up:
> 
> (XEN) d[IDLE]v0 Over-allocation for d[XEN]: 1 > 0
> 
> I don't think we should pursue that.

Hm, I see, yes, you will need to use MEMF_no_refcount to skip the
domain allocation accounting, but as you say it might not be a very
wise idea.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.