[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/idle: prevent entering C6 with in service interrupts on Intel



On 07.05.2020 15:22, Roger Pau Monne wrote:
> Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt
> May Be Dispatched Before an Interrupt of The Same Priority Completes".
> 
> It's not clear which models are affected, as the errata is listed in
> the "Second Generation Intel Xeon Scalable Processors" specification
> update, but the issue has been seen as far back as Nehalem processors.
> Apply the workaround to all Intel processors, the condition can be
> relaxed later.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
>  docs/misc/xen-command-line.pandoc |  8 ++++++++
>  xen/arch/x86/acpi/cpu_idle.c      | 22 +++++++++++++++++++++-
>  xen/arch/x86/cpu/mwait-idle.c     |  3 +++
>  xen/include/asm-x86/cpuidle.h     |  1 +
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index ee12b0f53f..6e868a2185 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -652,6 +652,14 @@ Specify the size of the console debug trace buffer. By 
> specifying `cpu:`
>  additionally a trace buffer of the specified size is allocated per cpu.
>  The debug trace feature is only enabled in debugging builds of Xen.
>  
> +### disable-c6-isr
> +> `= <boolean>`
> +
> +> Default: `true for Intel CPUs`
> +
> +Workaround for Intel errata CLX30. Prevent entering C6 idle states with in
> +service local APIC interrupts. Enabled by default for all Intel CPUs.
> +
>  ### dma_bits
>  > `= <integer>`
>  
> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> index b83446e77d..5023fea148 100644
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -573,6 +573,25 @@ static bool errata_c6_eoi_workaround(void)
>      return (fix_needed && cpu_has_pending_apic_eoi());
>  }
>  
> +static int8_t __read_mostly disable_c6_isr = -1;
> +boolean_param("disable-c6-isr", disable_c6_isr);
> +
> +/*
> + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an
> + * Interrupt of The Same Priority Completes.
> + *
> + * Prevent entering C6 if there are pending lapic interrupts, or else the
> + * processor might dispatch further pending interrupts before the first one 
> has
> + * been completed.
> + */
> +bool errata_c6_isr_workaround(void)
> +{
> +    if ( unlikely(disable_c6_isr == -1) )
> +        disable_c6_isr = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
> +
> +    return disable_c6_isr && cpu_has_pending_apic_eoi();

This check being the same as in errata_c6_eoi_workaround(),
why don't you simply extend that function? (See also below.)
Also both variable and command line option descriptor could
go inside the function, to limit their scopes.

> @@ -676,7 +695,8 @@ static void acpi_processor_idle(void)
>          return;
>      }
>  
> -    if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() )
> +    if ( (cx->type == ACPI_STATE_C3) &&
> +         (errata_c6_eoi_workaround() || errata_c6_isr_workaround()) )
>          cx = power->safe_state;

I realize you only add to existing code, but I'm afraid this
existing code cannot be safely added to. Already prior to
your change there was a curious mismatch of C3 and c6 on this
line, and I don't see how ACPI_STATE_C3 correlates with
"core C6" state. Now this may have been the convention for
Nehalem/Westmere systems, but already the mwait-idle entries
for these CPU models have 4 entries (albeit such that they
match this scheme). As a result I think this at the very
least needs to be >=, not ==, even more so now that you want
to cover all Intel CPUs.

Obviously (I think) it is a mistake that mwait-idle doesn't
also activate this workaround, which imo is another reason to
stick to just errata_c6_eoi_workaround().

> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -770,6 +770,9 @@ static void mwait_idle(void)
>               return;
>       }
>  
> +     if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround())
> +             cx = power->safe_state;

Here it becomes even more relevant I think that == not be
used, as the static tables list deeper C-states; it's just
that the SKX table, which also gets used for CLX afaict, has
no entries beyond C6-SKX. Others with deeper ones presumably
have the deeper C-states similarly affected (or not) by this
erratum.

For reference, mwait_idle_cpu_init() has

                hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
                state = MWAIT_HINT2CSTATE(hint) + 1;
                ...
                cx->type = state;

i.e. the value you compare against is derived from the static
table entries. For Nehalem/Westmere this means that what goes
under ACPI_STATE_C3 is indeed C6-NHM, and this looks to match
for all the non-Atoms, but for none of the Atoms. Now Atoms
could easily be unaffected, but (just to take an example) if
C6-SNB was affected, surely C7-SNB would be, too.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.