[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/nmi: ensure Global Performance Counter Control is setup correctly



On Wed, Jan 10, 2024 at 05:41:41PM +0000, Andrew Cooper wrote:
> On 10/01/2024 4:58 pm, Roger Pau Monné wrote:
> > On Wed, Jan 10, 2024 at 03:52:49PM +0000, Andrew Cooper wrote:
> >> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> >>> When Architectural Performance Monitoring is available, the 
> >>> PERF_GLOBAL_CTRL
> >>> MSR contains per-counter enable bits that is ANDed with the enable bit in 
> >>> the
> >>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
> >>>
> >>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL 
> >>> enable
> >>> bits being set by default, but at least on some Intel Sapphire and Emerald
> >>> Rapids this is no longer the case, and Xen reports:
> >>>
> >>> Testing NMI watchdog on all CPUs: 0 40 stuck
> >>>
> >>> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so 
> >>> PMC0
> >>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> >>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
> >>>
> >>> Fix by detecting when Architectural Performance Monitoring is available 
> >>> and
> >>> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >>> ---
> >>> The fact that it's only the first CPU on each socket that's started with
> >>> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case 
> >>> making
> >>> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
> >> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
> >> probably worth saying that we're raising it with Intel, but this bug is
> >> out in production firmware for SPR and EMR.
> >>
> >>> ---
> >>>  xen/arch/x86/nmi.c | 13 +++++++++++++
> >>>  1 file changed, 13 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> >>> index dc79c25e3ffd..7a6601c4fd31 100644
> >>> --- a/xen/arch/x86/nmi.c
> >>> +++ b/xen/arch/x86/nmi.c
> >>> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
> >>>           nmi_p6_event_width > BITS_PER_LONG )
> >>>          return;
> >>>  
> >>> +    if ( cpu_has_arch_perfmon )
> >>> +    {
> >>> +        uint64_t global_ctrl;
> >>> +
> >>> +        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> >>> +        /*
> >>> +         * Make sure PMC0 is enabled in global control, as the enable 
> >>> bit in
> >>> +         * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> >>> +         */
> >>> +        if ( !(global_ctrl & 1) )
> >>> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
> >> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
> >> If nothing else because that will make debugging using other counters
> >> more reliable too.
> > Hm, yes, I was borderline on enabling all possible counters in
> > PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].
> >
> > But then wondered if it was going too far, as for the purposes here we
> > just care about PMC1.
> >
> > My reasoning for not doing it would be that such wide setup of
> > PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
> > usages of other counters apart from PMC0 will be gated on the watchdog
> > being enabled.  It seems more reliable to me to either do the setting
> > of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
> > user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.
> 
> It is buggy that each socket-BSP is handed over with ctl=0 rather than 0xff.
> 
> But we're exasperating the bug by not returning each socket-BSP to the
> default behaviour.
> 
> 
> It makes a practical difference if a developer wants to hand-code up
> PCR2.

I'm afraid I'm lost at what the PCR2 acronym references to here, as I
cannot find any instance of it in SDM vols 2, 3 or 4.

> It also makes a practical difference to what a guest sees when it
> executes RDPMC in guests, because right now the perf counter values leak
> in (there's another oustanding patch series of mine trying to stem this
> leak).

But RDPMC just fetches the contents of the counters, so it has no
visibility on the value of PERF_GLOBAL_CTRL.  Albeit the settings in
PERF_GLOBAL_CTRL will affect whether the counters are enabled or not,
I'm not sure a guest without access to vPMU should expect to get any
kind of consistent results out of RDPMC.

> 
> The fixup we're performing here isn't "because we're using one
> counter".  It's to get state back to default.

I'm certainly not opposed to that, but as said in my previous reply,
the adjustment should then be done somewhere else and not in
setup_p6_watchdog().  Unless there are further objections I will send
a patch to enable all general purpose PMCs in PERF_GLOBAL_CTRL at
init_intel().

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.