[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0
On 23.10.2023 14:46, Roger Pau Monne wrote: > Sporadically we have seen the following during AP bringup on AMD platforms > only: > > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = > 2023-05-17 > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = > 2023-05-17 > CPU60: No irq handler for vector 27 (IRQ -2147483648) > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = > 2023-05-17 > > This is similar to the issue raised on Linux commit 36e9e1eab777e, where they > observed i8259 (active) vectors getting delivered to CPUs different than 0. > > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt > descriptors to contain all possible CPUs, so that APs will reserve the vector > at startup if any legacy IRQ is still delivered through the i8259. Note that > if the IO-APIC takes over those interrupt descriptors the CPU mask will be > reset. > > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected even > when all i8259 pins are masked, and hence would need to be handled on all > CPUs. > > Do not reserve the PIC spurious vectors on all CPUs, but do check for such > spurious interrupts on all CPUs if the vendor is AMD or Hygon. The first half of this sentence reads as if it was describing a change, but aiui there's no change to existing behavior in this regard anymore. Maybe use something like "continue to reserve PIC vectors on CPU0 only"? > Note that once > the vectors get used by devices detecting PIC spurious interrupts will no > longer be possible, however the device should be able to cope with spurious > interrupt. Such PIC spurious interrupts occurring when the vector is in use > by > a local APIC routed source will lead to an extra EOI, which might > unintentionally clear a different vector from ISR. Note this is already the > current behavior, so assume it's infrequent enough to not cause real issues. > > Finally, adjust the printed message to display the CPU where the spurious > interrupt has been received, so it looks like: > > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = > 2023-05-17 > cpu1: spurious 8259A interrupt: IRQ7 > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = > 2023-05-17 > > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs') > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > Changes since v1: > - Do not reserved spurious PIC vectors on APs, but still check for spurious > PIC interrupts. > - Reword commit message. > --- > Not sure if the Fixes tag is the most appropriate here, since AFAICT this is a > hardware glitch, but it makes it easier to see to which versions the fix > should > be backported, because Xen previous behavior was to reserve all legacy vectors > on all CPUs. I'm inclined to suggest to (informally) invent an Amends: tag. > --- a/xen/arch/x86/i8259.c > +++ b/xen/arch/x86/i8259.c > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq); > > bool bogus_8259A_irq(unsigned int irq) > { > + if ( smp_processor_id() && > + !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > + /* > + * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, as > it > + * has been observed that during unknown circumstances spurious PIC > + * interrupts have been delivered to CPUs different than the BSP. > + */ > + return false; > + > return !_mask_and_ack_8259A_irq(irq); > } I don't think this function should be changed; imo the adjustment belongs at the call site. > @@ -349,7 +359,22 @@ void __init init_IRQ(void) > continue; > desc->handler = &i8259A_irq_type; > per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq; > - cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu)); > + > + /* > + * The interrupt affinity logic never targets interrupts to offline > + * CPUs, hence it's safe to use cpumask_all here. > + * > + * Legacy PIC interrupts are only targeted to CPU0, but depending on > + * the platform they can be distributed to any online CPU in > hardware. > + * Note this behavior has only been observed on AMD hardware. In > order > + * to cope install all active legacy vectors on all CPUs. > + * > + * IO-APIC will change the destination mask if/when taking ownership > of > + * the interrupt. > + */ > + cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & > + (X86_VENDOR_AMD | > X86_VENDOR_HYGON) ? > + &cpumask_all : cpumask_of(cpu)); Nit: Imo this would better be wrapped as cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all : cpumask_of(cpu)); or (considering how we often prefer to wrap conditional operators) cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all : cpumask_of(cpu)); or cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all : cpumask_of(cpu)); , possibly with the argument spanning three lines additionally parenthesized as a whole. (Of course an amd_like() construct like we have in the emulator would further help readability here, but the way that works it cannot be easily generalized for use outside of the emulator.) Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |