[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/S3: restore MCE (APs) and add MTRR (BSP) init


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Mon, 23 Mar 2026 12:26:03 +0100
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm1 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=fm1 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 23 Mar 2026 11:26:17 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 23, 2026 at 12:21:46PM +0100, Jan Beulich wrote:
> On 04.03.2026 16:00, Marek Marczykowski wrote:
> > On Wed, Mar 04, 2026 at 03:47:14PM +0100, Jan Beulich wrote:
> >> On 04.03.2026 15:36, Marek Marczykowski wrote:
> >>> On Wed, Mar 04, 2026 at 02:39:01PM +0100, Jan Beulich wrote:
> >>>> MCE init for APs was broken when CPU feature re-checking was added. MTRR
> >>>> (re)init for the BSP looks to never have been there on the resume path.
> >>>>
> >>>> Fixes: bb502a8ca592 ("x86: check feature flags after resume")
> >>>> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> ---
> >>>> Sadly we need to go by CPU number (zero vs non-zero) here. See the call
> >>>> site of recheck_cpu_features() in enter_state().
> >>>
> >>> With this patch, I now see the "Thermal monitoring enabled" on resume
> >>> also for AP.
> >>> And then, the "Temperature above threshold" + "Running in modulated
> >>> clock mode" for AP too. But, I don't see matching "Temperature/speed
> >>> normal" for any of them...
> >>
> >> Which would imply that for each CPU you see at most one such message after
> >> resume. Can you confirm this? 
> > 
> > For the current test, yes. I got the messages for CPUs 16, 6, 18, 4, 2 -
> > in this order. Not for 0, 8-15 or 20-21. Not sure about CPU0, but for
> > others it kinda looks like I got it for P cores, but not E cores? But
> > I'm not sure how to reliably distinguish them - I base it on the holes
> > in numbering due to smt=off. Specifically I have online CPUs:
> > 0,2,4,6,8-16,18,20-21 (yeah, weird ordering...).
> 
> I wonder, btw, if this is good enough to translate into a Tested-by: for
> this patch. Thoughts?

I think so, It clearly fixes reporting issue.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.