|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0
On Wed, Jun 03, 2026 at 10:03:53AM -0400, Jason Andryuk wrote:
> On 2026-06-02 03:01, Roger Pau Monné wrote:
> > On Mon, Jun 01, 2026 at 05:07:52PM -0400, Jason Andryuk wrote:
> > > On 2026-06-01 13:00, Roger Pau Monné wrote:
> > > > On Tue, May 26, 2026 at 04:31:14PM -0400, Jason Andryuk wrote:
> > > > > A single stubs page is initialized with 0xcc and re-used, with
> > > > > multiple
> > > > > CPUs each using a portion of the shared page. In cpu_smpboot_free(),
> > > > > each stubs area is checked against 0xcc. When all are set to 0xcc,
> > > > > the
> > > > > page is freed.
> > > > >
> > > > > Booting a system with smt=0, CPU0 is initially setup, allocating the
> > > > > stubs page and initializing to 0xcc. When more CPUs are brought up,
> > > > > CPU1 is initialized and then immediately brough offline as it is the
> > > > > sibling of CPU0. Since the page was initially memset with 0xcc,
> > > > > cpu_smpboot_free() finds all stubs as 0xcc and frees the page.
> > > > > However, the page is still assigned to CPU0 and continues to be
> > > > > assigned
> > > > > to other CPUs.
> > > > >
> > > > > Meanwhile the page can be reallocated, which can lead to misbehavior.
> > > > > The particular instance was the stubs page re-used as a page table
> > > > > which
> > > > > later faulted when the entry was all 0xcc.
> > > > >
> > > > > Change to initializing the page as 0xd6/STUB_BUF_FREE, and
> > > > > initializing
> > > > > individual stubs as 0xcc/STUB_BUF_USED. 0xd6 now indicates unused,
> > > > > and
> > > > > 0xcc indicates used/assigned. When freeing a CPU, the stub is set to
> > > > > 0xd6, and the page is freed if all stubs are 0xd6. Initializing with
> > > > > STUB_BUF_FREE lets cpu_smpboot_free() a page that was only ever
> > > > > partially used.
> > > > >
> > > > > 0xd6/UDB is a 1 byte invalid opcode, which is similar to the existing
> > > > > use of 0xcc. 0xd6 is used to identify bug frames, but the stub addr
> > > > > (e.g. 0xffff82d07fffe000) fails the is_active_kernel_text() check. It
> > > > > should be okay to use here.
> > > > >
> > > > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> > > > > Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
> > > > > ---
> > > > > It would be nice to use get_page()/put_page() to let count_info handle
> > > > > reference counting, but they require an owning domain.
> > > > >
> > > > > The listed Fixes introduced the use of 0xcc, but the smt commit may
> > > > > have
> > > > > made it more problematic.
> > > > > Fixes: d8f974f1a646 ("x86: command line option to avoid use of
> > > > > secondary hyper-threads")
> > > >
> > > > Speaking with Andrew, we believe it might be easier to simply forego
> > > > the freeing of the page, possibly something like:
> > > >
> > > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> > > > index ff05955bae40..62c6cbf4b561 100644
> > > > --- a/xen/arch/x86/smpboot.c
> > > > +++ b/xen/arch/x86/smpboot.c
> > > > @@ -990,19 +990,12 @@ static void cpu_smpboot_free(unsigned int cpu,
> > > > bool remove)
> > > > {
> > > > mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
> > > > unsigned char *stub_page = map_domain_page(mfn);
> > > > - unsigned int i;
> > > > memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc,
> > > > STUB_BUF_SIZE);
> > > > - for ( i = 0; i < STUBS_PER_PAGE; ++i )
> > > > - if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
> > > > - break;
> > > > unmap_domain_page(stub_page);
> > > > destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
> > > > (per_cpu(stubs.addr, cpu) | ~PAGE_MASK)
> > > > + 1);
> > > > per_cpu(stubs.addr, cpu) = 0;
> > > > - per_cpu(stubs.mfn, cpu) = 0;
> > > > - if ( i == STUBS_PER_PAGE )
> > > > - free_domheap_page(mfn_to_page(mfn));
> > > > }
> > > > if ( IS_ENABLED(CONFIG_PV32) )
> >
> > I think I've made an oversight in the code above: if all 32 CPUs
> > sharing the same stubs page are offlined, the reference to the stubs
> > page is possibly lost (if CPUs are not parked) and a new stubs page
> > would be allocated if any of those CPUs is brought back online, thus
> > leaking the previous allocation. The simplest way to solve this would
> > be to introduce an array that indexes the stub pages, and replace the
> > logic in cpu_smpboot_alloc() that figures out whether stubs.mfn is set
> > for adjacent CPUs.
>
> Right, but I thought Andrew's point was that offlining 32 CPUs is
> unrealistic, so don't even bother tracking. If CPUs are offlined (and you
> somehow keep running), you can leak the page.
I thin we should avoid freeing the page and ensure it's always reused,
rather than possibly leaking it. It's also possible there's a single
trailing CPU using the last stubs page alone, and offlining and
onlining it would trigger such a page leak, without requiring a block
of 32 CPUs going offline.
Entering an ACPI sleep state causes all APs to be offlined (see the
disable_nonboot_cpus() call in enter_state()), and it would be
undesirable that putting a system to sleep causes page leaking.
> > > > (there might be further cleanup possible if the page is not freed, the
> > > > above chunk is untested).
> > > >
> > > > It's a single page shared between 32 CPUs, and offlining 32 adjacent
> > > > CPUs seems very unlikely. IMO the extra complexity of having to deal
> > > > with the freeing overshadows the very small memory gain we get from
> > > > it.
> > >
> > > Hi Roger,
> > >
> > > Yes, I made and tested the same change locally last week. Well, I
> > > retained:
> > > per_cpu(stubs.mfn, cpu) = 0;
> > >
> > > Maybe it would be good to save the mfn in case the CPU returns? But I
> > > thought per-cpu vars are cleared, so it wouldn't be available anyway?
> >
> > Depends on whether the CPUs are parked or not (see park_offline_cpus).
> > I think leaving stubs.mfn is fine, in the parked case we avoid part of
> > the setup logic by already having the mfn cached (no big deal either
> > way).
>
> Right.
>
> > > Also, I was waiting to see if anyone chimed in with other ideas.
> >
> > Maybe you could assign the page to dom_xen and then use
> > {get,put}_page(), but again it seems overly complicated.
>
> Code-wise this doesn't look bad, but it blows up:
>
> (XEN) d[IDLE]v0 Over-allocation for d[XEN]: 1 > 0
>
> I don't think we should pursue that.
Hm, I see, yes, you will need to use MEMF_no_refcount to skip the
domain allocation accounting, but as you say it might not be a very
wise idea.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |