[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/idle: prevent entering C6 with in service interrupts on Intel



On Fri, May 08, 2020 at 03:46:08PM +0200, Jan Beulich wrote:
> On 07.05.2020 15:22, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> > index b83446e77d..5023fea148 100644
> > --- a/xen/arch/x86/acpi/cpu_idle.c
> > +++ b/xen/arch/x86/acpi/cpu_idle.c
> > @@ -573,6 +573,25 @@ static bool errata_c6_eoi_workaround(void)
> >      return (fix_needed && cpu_has_pending_apic_eoi());
> >  }
> >  
> > +static int8_t __read_mostly disable_c6_isr = -1;
> > +boolean_param("disable-c6-isr", disable_c6_isr);
> > +
> > +/*
> > + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an
> > + * Interrupt of The Same Priority Completes.
> > + *
> > + * Prevent entering C6 if there are pending lapic interrupts, or else the
> > + * processor might dispatch further pending interrupts before the first 
> > one has
> > + * been completed.
> > + */
> > +bool errata_c6_isr_workaround(void)
> > +{
> > +    if ( unlikely(disable_c6_isr == -1) )
> > +        disable_c6_isr = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
> > +
> > +    return disable_c6_isr && cpu_has_pending_apic_eoi();
> 
> This check being the same as in errata_c6_eoi_workaround(),
> why don't you simply extend that function? (See also below.)
> Also both variable and command line option descriptor could
> go inside the function, to limit their scopes.

Since this is actually a superset (as it covers all Intel CPUs), I
should delete the previous (more restricted) workaround matching
logic.

> > @@ -676,7 +695,8 @@ static void acpi_processor_idle(void)
> >          return;
> >      }
> >  
> > -    if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() )
> > +    if ( (cx->type == ACPI_STATE_C3) &&
> > +         (errata_c6_eoi_workaround() || errata_c6_isr_workaround()) )
> >          cx = power->safe_state;
> 
> I realize you only add to existing code, but I'm afraid this
> existing code cannot be safely added to. Already prior to
> your change there was a curious mismatch of C3 and c6 on this
> line, and I don't see how ACPI_STATE_C3 correlates with
> "core C6" state. Now this may have been the convention for
> Nehalem/Westmere systems, but already the mwait-idle entries
> for these CPU models have 4 entries (albeit such that they
> match this scheme). As a result I think this at the very
> least needs to be >=, not ==, even more so now that you want
> to cover all Intel CPUs.

Hm, I think this is because AFAICT acpi_processor_idle only
understands up to ACPI_STATE_C3, passing a type > ACPI_STATE_C3 will
just cause it to fallback to C0. I've adjusted the comparison to use
>= instead, as a safety imporvement in case we add support for more
states to acpi_processor_idle.

> Obviously (I think) it is a mistake that mwait-idle doesn't
> also activate this workaround, which imo is another reason to
> stick to just errata_c6_eoi_workaround().

I assumed the previous workaround didn't apply to any of the CPUs
supported by the mwait-idle driver, since it seems to only support
recent-ish models.

> > --- a/xen/arch/x86/cpu/mwait-idle.c
> > +++ b/xen/arch/x86/cpu/mwait-idle.c
> > @@ -770,6 +770,9 @@ static void mwait_idle(void)
> >             return;
> >     }
> >  
> > +   if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround())
> > +           cx = power->safe_state;
> 
> Here it becomes even more relevant I think that == not be
> used, as the static tables list deeper C-states; it's just
> that the SKX table, which also gets used for CLX afaict, has
> no entries beyond C6-SKX. Others with deeper ones presumably
> have the deeper C-states similarly affected (or not) by this
> erratum.
> 
> For reference, mwait_idle_cpu_init() has
> 
>               hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
>               state = MWAIT_HINT2CSTATE(hint) + 1;
>                 ...
>               cx->type = state;
> 
> i.e. the value you compare against is derived from the static
> table entries. For Nehalem/Westmere this means that what goes
> under ACPI_STATE_C3 is indeed C6-NHM, and this looks to match
> for all the non-Atoms, but for none of the Atoms. Now Atoms
> could easily be unaffected, but (just to take an example) if
> C6-SNB was affected, surely C7-SNB would be, too.

Yes, I've adjusted this to use cx->type >= 3 instead. I have to admit
I'm confused by the name of the states being C6-* while the cstate
value reported by Xen will be 3 (I would instead expect the type to be
6 in order to match the name).

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.