| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0
 On 23.10.2023 14:46, Roger Pau Monne wrote:
> Sporadically we have seen the following during AP bringup on AMD platforms
> only:
> 
> microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = 
> 2023-05-17
> microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = 
> 2023-05-17
> CPU60: No irq handler for vector 27 (IRQ -2147483648)
> microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = 
> 2023-05-17
> 
> This is similar to the issue raised on Linux commit 36e9e1eab777e, where they
> observed i8259 (active) vectors getting delivered to CPUs different than 0.
> 
> On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
> descriptors to contain all possible CPUs, so that APs will reserve the vector
> at startup if any legacy IRQ is still delivered through the i8259.  Note that
> if the IO-APIC takes over those interrupt descriptors the CPU mask will be
> reset.
> 
> Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected even
> when all i8259 pins are masked, and hence would need to be handled on all 
> CPUs.
> 
> Do not reserve the PIC spurious vectors on all CPUs, but do check for such
> spurious interrupts on all CPUs if the vendor is AMD or Hygon.
The first half of this sentence reads as if it was describing a change,
but aiui there's no change to existing behavior in this regard anymore.
Maybe use something like "continue to reserve PIC vectors on CPU0 only"?
>  Note that once
> the vectors get used by devices detecting PIC spurious interrupts will no
> longer be possible, however the device should be able to cope with spurious
> interrupt.  Such PIC spurious interrupts occurring when the vector is in use 
> by
> a local APIC routed source will lead to an extra EOI, which might
> unintentionally clear a different vector from ISR.  Note this is already the
> current behavior, so assume it's infrequent enough to not cause real issues.
> 
> Finally, adjust the printed message to display the CPU where the spurious
> interrupt has been received, so it looks like:
> 
> microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = 
> 2023-05-17
> cpu1: spurious 8259A interrupt: IRQ7
> microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = 
> 2023-05-17
> 
> Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Changes since v1:
>  - Do not reserved spurious PIC vectors on APs, but still check for spurious
>    PIC interrupts.
>  - Reword commit message.
> ---
> Not sure if the Fixes tag is the most appropriate here, since AFAICT this is a
> hardware glitch, but it makes it easier to see to which versions the fix 
> should
> be backported, because Xen previous behavior was to reserve all legacy vectors
> on all CPUs.
I'm inclined to suggest to (informally) invent an Amends: tag.
> --- a/xen/arch/x86/i8259.c
> +++ b/xen/arch/x86/i8259.c
> @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
>  
>  bool bogus_8259A_irq(unsigned int irq)
>  {
> +    if ( smp_processor_id() &&
> +         !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +        /*
> +         * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, as 
> it
> +         * has been observed that during unknown circumstances spurious PIC
> +         * interrupts have been delivered to CPUs different than the BSP.
> +         */
> +        return false;
> +
>      return !_mask_and_ack_8259A_irq(irq);
>  }
I don't think this function should be changed; imo the adjustment belongs
at the call site.
> @@ -349,7 +359,22 @@ void __init init_IRQ(void)
>              continue;
>          desc->handler = &i8259A_irq_type;
>          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> -        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> +
> +        /*
> +         * The interrupt affinity logic never targets interrupts to offline
> +         * CPUs, hence it's safe to use cpumask_all here.
> +         *
> +         * Legacy PIC interrupts are only targeted to CPU0, but depending on
> +         * the platform they can be distributed to any online CPU in 
> hardware.
> +         * Note this behavior has only been observed on AMD hardware. In 
> order
> +         * to cope install all active legacy vectors on all CPUs.
> +         *
> +         * IO-APIC will change the destination mask if/when taking ownership 
> of
> +         * the interrupt.
> +         */
> +        cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor &
> +                                          (X86_VENDOR_AMD | 
> X86_VENDOR_HYGON) ?
> +                                          &cpumask_all : cpumask_of(cpu));
Nit: Imo this would better be wrapped as
        cpumask_copy(desc->arch.cpu_mask,
                     boot_cpu_data.x86_vendor &
                     (X86_VENDOR_AMD | X86_VENDOR_HYGON) ?
                     &cpumask_all : cpumask_of(cpu));
or (considering how we often prefer to wrap conditional operators)
        cpumask_copy(desc->arch.cpu_mask,
                     boot_cpu_data.x86_vendor &
                     (X86_VENDOR_AMD | X86_VENDOR_HYGON)
                     ? &cpumask_all : cpumask_of(cpu));
or
        cpumask_copy(desc->arch.cpu_mask,
                     boot_cpu_data.x86_vendor &
                     (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
                                                         : cpumask_of(cpu));
, possibly with the argument spanning three lines additionally
parenthesized as a whole.
(Of course an amd_like() construct like we have in the emulator would
further help readability here, but the way that works it cannot be
easily generalized for use outside of the emulator.)
Jan
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |