[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jason Andryuk <jason.andryuk@xxxxxxx>
  • Date: Wed, 3 Jun 2026 10:03:53 -0400
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GtUQ7dI8GYYjwXRow9TBh9EOBLUJfW3qGKPhhNc4IXk=; b=jgi4sPUdI04BMDVV4Z2sXfo8CBUggsSXj04MFD+23oL6kNfI+orwHa67L73uLsfhjYTUQp+5WZyP2vAQig+O8M1xiVB+B0p3VU32o262u1DEoMWbqu7DOmdkim1BLP2t9a1AkUMpvLQ0c+s0idbNJIAzFoWtU3MA791qKO/mhDegOrWSu7QmBoV5qdqAqmTgp9243qQbSIPmvpIjH+fq45D7MuX6JakeSoc1XxRvlbBlE5hGILbkD+uMLiX5K68xh909MuEm1u6kfTMuAWrREKYdeTTjlyjCYVQuuqbAC+L9PlX+UNGpwVIOIMsBnEPHm5x4FVOcTFL382q1zIdjNA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=EHIl9oYmS4DKY/4SzFtBhlMM8JQT47zLP+qnyFNUETUjZ6hDLqKsI0CLfpuDUC1DwYW8ldL/XgK4Bd5T6HR+PTshDxiF98Oqt39+ia8s1WG3zKJxv261WYijVrA8XaHkUEbH3Ra2trAjA9QjBX5C8ZT5IcDYP8yx3Sk8Zqa3bUmG6BXUwCgjEoBZ1H/cUnyb+6L5NwjtHL+g4vd/d1e/lBfkoH3K2aIOPaPA7o5j4rcmMwr2hgiMUJDNdqUjWPCAhUPzmjLKdVBMLDfu49POfMXnf2QACpkXkRnwhVQH91PFEYPODAAVLqcV6Uur0+RoPsA98ZdJNyHEMBY8SyLcxA==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=amd.com header.i="@amd.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>
  • Delivery-date: Wed, 03 Jun 2026 14:04:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2026-06-02 03:01, Roger Pau Monné wrote:
On Mon, Jun 01, 2026 at 05:07:52PM -0400, Jason Andryuk wrote:
On 2026-06-01 13:00, Roger Pau Monné wrote:
On Tue, May 26, 2026 at 04:31:14PM -0400, Jason Andryuk wrote:
A single stubs page is initialized with 0xcc and re-used, with multiple
CPUs each using a portion of the shared page.  In cpu_smpboot_free(),
each stubs area is checked against 0xcc.  When all are set to 0xcc, the
page is freed.

Booting a system with smt=0, CPU0 is initially setup, allocating the
stubs page and initializing to 0xcc.  When more CPUs are brought up,
CPU1 is initialized and then immediately brough offline as it is the
sibling of CPU0.  Since the page was initially memset with 0xcc,
cpu_smpboot_free() finds all stubs as 0xcc and frees the page.
However, the page is still assigned to CPU0 and continues to be assigned
to other CPUs.

Meanwhile the page can be reallocated, which can lead to misbehavior.
The particular instance was the stubs page re-used as a page table which
later faulted when the entry was all 0xcc.

Change to initializing the page as 0xd6/STUB_BUF_FREE, and initializing
individual stubs as 0xcc/STUB_BUF_USED.  0xd6 now indicates unused, and
0xcc indicates used/assigned.  When freeing a CPU, the stub is set to
0xd6, and the page is freed if all stubs are 0xd6.  Initializing with
STUB_BUF_FREE lets cpu_smpboot_free() a page that was only ever
partially used.

0xd6/UDB is a 1 byte invalid opcode, which is similar to the existing
use of 0xcc.  0xd6 is used to identify bug frames, but the stub addr
(e.g. 0xffff82d07fffe000) fails the is_active_kernel_text() check.  It
should be okay to use here.

Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
---
It would be nice to use get_page()/put_page() to let count_info handle
reference counting, but they require an owning domain.

The listed Fixes introduced the use of 0xcc, but the smt commit may have
made it more problematic.
Fixes: d8f974f1a646 ("x86: command line option to avoid use of secondary 
hyper-threads")

Speaking with Andrew, we believe it might be easier to simply forego
the freeing of the page, possibly something like:

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index ff05955bae40..62c6cbf4b561 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -990,19 +990,12 @@ static void cpu_smpboot_free(unsigned int cpu, bool 
remove)
       {
           mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
           unsigned char *stub_page = map_domain_page(mfn);
-        unsigned int i;
           memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc, STUB_BUF_SIZE);
-        for ( i = 0; i < STUBS_PER_PAGE; ++i )
-            if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
-                break;
           unmap_domain_page(stub_page);
           destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
                                (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
           per_cpu(stubs.addr, cpu) = 0;
-        per_cpu(stubs.mfn, cpu) = 0;
-        if ( i == STUBS_PER_PAGE )
-            free_domheap_page(mfn_to_page(mfn));
       }
       if ( IS_ENABLED(CONFIG_PV32) )

I think I've made an oversight in the code above: if all 32 CPUs
sharing the same stubs page are offlined, the reference to the stubs
page is possibly lost (if CPUs are not parked) and a new stubs page
would be allocated if any of those CPUs is brought back online, thus
leaking the previous allocation.  The simplest way to solve this would
be to introduce an array that indexes the stub pages, and replace the
logic in cpu_smpboot_alloc() that figures out whether stubs.mfn is set
for adjacent CPUs.

Right, but I thought Andrew's point was that offlining 32 CPUs is unrealistic, so don't even bother tracking. If CPUs are offlined (and you somehow keep running), you can leak the page.

(there might be further cleanup possible if the page is not freed, the
above chunk is untested).

It's a single page shared between 32 CPUs, and offlining 32 adjacent
CPUs seems very unlikely.  IMO the extra complexity of having to deal
with the freeing overshadows the very small memory gain we get from
it.

Hi Roger,

Yes, I made and tested the same change locally last week.  Well, I retained:
      per_cpu(stubs.mfn, cpu) = 0;

Maybe it would be good to save the mfn in case the CPU returns?  But I
thought per-cpu vars are cleared, so it wouldn't be available anyway?

Depends on whether the CPUs are parked or not (see park_offline_cpus).
I think leaving stubs.mfn is fine, in the parked case we avoid part of
the setup logic by already having the mfn cached (no big deal either
way).

Right.

Also, I was waiting to see if anyone chimed in with other ideas.

Maybe you could assign the page to dom_xen and then use
{get,put}_page(), but again it seems overly complicated.

Code-wise this doesn't look bad, but it blows up:

(XEN) d[IDLE]v0 Over-allocation for d[XEN]: 1 > 0

I don't think we should pursue that.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.