[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/i8259: do not assume interrupts always target CPU0


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 24 Oct 2023 12:14:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=B4pDAbM2NuNwAOp5cZ4BqnSbLnPSw7QgU4Gks1MLb/o=; b=igsk1tySomFBjtD2KSUg6GnGA+vPZQF1lTgtOOXoS6vJP4hricbQNQfdTJ0CG4w1jbNeC9skOJlpumPqnu4aLzXO6nxMIXOeqQN+4t3fNE1AHRxlTuCVtV1hckPaZcXtb2b0k4g1o0+gqwZh/5dB8+dZpCEdDD8uz2XkgchihY/qZJmyzzI/9SEK1+/CeoXlf8AXaumBxCHPzG9j5dJabrsoq3vCiWDxmA/v8u4KFwFj/yDUy9bZtXSFbe6slYLLU95b1muBh9MwQ4rVBHXNez8+Z7YGKyWGJ08svBO1KVAqGRhBt2TtNRjQDe6xDx9VeGfTUElJiwZI2xc+Ft227w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fYMkjT4lT6oZFQakKXtaPgEqy3PQeSbgI/jg9zKFHKGMoHP7FkvvIOUSWixImX0QlxGFcyxbRYie5lP5SgFaeNApYx54NPXVRw9PixBfpKQ+4931JswWNXqGTrn5IRD4kCqhi9ARWf8rUxzuf+TIqodR/tWrg1QQrJfuqRr6TigI7L88xQFSxVyoV1lZ35noVXbEB9iGwNbn5H+kDZ5WFXW0wwC0R6gxIVQdOSdwoSzNsMlxgqpEQ0h6+1u6rsmlHt1mYe+x3v4gIgT/8PSaxovsnP3/V0/3efMPwUmE/ZLWlz8iSDcWhr8xg3qVhOI1iFhdEgHFCbifjXPSoDA9Aw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 24 Oct 2023 10:14:18 +0000
  • Ironport-data: A9a23:b2/Oq6OEaiN45W/vrR2ClsFynXyQoLVcMsEvi/4bfWQNrUp01jQPx 2JMWG/TPPbcY2Wjfo13YImyoEsBusPSnN41SAto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CQ6jefQAOOkVIYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/nrRC9H5qyo42pA5ABmPJingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0sZQOWZK7 OUIEz1OdR2nttKRnuLnFuY506zPLOGzVG8ekldJ6GmDSNwAGNXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PNxvzC7IA9ZidABNPLPfdOHX4NNl1uwr WPa5WXpRBodMbRzzBLcqCz937aWzXuTtIQ6T76qp6R2kn6q9HVDOS9RUlGYjaa9lRvrMz5YA wlOksY0loAw/kG2Stj2XzWjvWWJ+BUbXrJ4DOkS+AyLjK3O7G6xFmUCCzJMdtEinMs3XiAxk E+EmcvzAj5iu6HTTmiSnop4thu3MCkRaGUEPCkNSFNf58G5+N1ryBXSUtxkDai5yMXvHi39y CyLqy54gKgPickM1OOw+lWvby+Qm6UlhzUdvm3/Nl9JJCsgDGJ5T+REMWTm0Ms=
  • Ironport-hdrordr: A9a23:JLNmu6gapOQxlxiaCtWH7/9D4nBQXh4ji2hC6mlwRA09TyX5ra 2TdZUgpHrJYVMqMk3I9uruBEDtex3hHP1OkOss1NWZPDUO0VHARO1fBOPZqAEIcBeOldK1u5 0AT0B/YueAd2STj6zBkXSF+wBL+qj6zEiq792usEuEVWtRGsVdB58SMHfiLqVxLjM2YqYRJd 6nyedsgSGvQngTZtTTPAh/YwCSz+e78q4PeHQ9dmca1DU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Oct 24, 2023 at 11:33:21AM +0200, Jan Beulich wrote:
> On 23.10.2023 14:46, Roger Pau Monne wrote:
> > Sporadically we have seen the following during AP bringup on AMD platforms
> > only:
> > 
> > microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = 
> > 2023-05-17
> > CPU60: No irq handler for vector 27 (IRQ -2147483648)
> > microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > 
> > This is similar to the issue raised on Linux commit 36e9e1eab777e, where 
> > they
> > observed i8259 (active) vectors getting delivered to CPUs different than 0.
> > 
> > On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
> > descriptors to contain all possible CPUs, so that APs will reserve the 
> > vector
> > at startup if any legacy IRQ is still delivered through the i8259.  Note 
> > that
> > if the IO-APIC takes over those interrupt descriptors the CPU mask will be
> > reset.
> > 
> > Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected 
> > even
> > when all i8259 pins are masked, and hence would need to be handled on all 
> > CPUs.
> > 
> > Do not reserve the PIC spurious vectors on all CPUs, but do check for such
> > spurious interrupts on all CPUs if the vendor is AMD or Hygon.
> 
> The first half of this sentence reads as if it was describing a change,
> but aiui there's no change to existing behavior in this regard anymore.
> Maybe use something like "continue to reserve PIC vectors on CPU0 only"?
> 
> >  Note that once
> > the vectors get used by devices detecting PIC spurious interrupts will no
> > longer be possible, however the device should be able to cope with spurious
> > interrupt.  Such PIC spurious interrupts occurring when the vector is in 
> > use by
> > a local APIC routed source will lead to an extra EOI, which might
> > unintentionally clear a different vector from ISR.  Note this is already the
> > current behavior, so assume it's infrequent enough to not cause real issues.
> > 
> > Finally, adjust the printed message to display the CPU where the spurious
> > interrupt has been received, so it looks like:
> > 
> > microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = 
> > 2023-05-17
> > cpu1: spurious 8259A interrupt: IRQ7
> > microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = 
> > 2023-05-17
> > 
> > Fixes: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v1:
> >  - Do not reserved spurious PIC vectors on APs, but still check for spurious
> >    PIC interrupts.
> >  - Reword commit message.
> > ---
> > Not sure if the Fixes tag is the most appropriate here, since AFAICT this 
> > is a
> > hardware glitch, but it makes it easier to see to which versions the fix 
> > should
> > be backported, because Xen previous behavior was to reserve all legacy 
> > vectors
> > on all CPUs.
> 
> I'm inclined to suggest to (informally) invent an Amends: tag.
> 
> > --- a/xen/arch/x86/i8259.c
> > +++ b/xen/arch/x86/i8259.c
> > @@ -37,6 +37,15 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
> >  
> >  bool bogus_8259A_irq(unsigned int irq)
> >  {
> > +    if ( smp_processor_id() &&
> > +         !(boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) 
> > )
> > +        /*
> > +         * For AMD/Hygon do spurious PIC interrupt detection on all CPUs, 
> > as it
> > +         * has been observed that during unknown circumstances spurious PIC
> > +         * interrupts have been delivered to CPUs different than the BSP.
> > +         */
> > +        return false;
> > +
> >      return !_mask_and_ack_8259A_irq(irq);
> >  }
> 
> I don't think this function should be changed; imo the adjustment belongs
> at the call site.

It makes the call site much more complex to follow, in fact I was
considering pulling the PIC vector range checks into
bogus_8259A_irq().  Would that convince you into placing the check here
rather than in the caller context?

> > @@ -349,7 +359,22 @@ void __init init_IRQ(void)
> >              continue;
> >          desc->handler = &i8259A_irq_type;
> >          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> > -        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> > +
> > +        /*
> > +         * The interrupt affinity logic never targets interrupts to offline
> > +         * CPUs, hence it's safe to use cpumask_all here.
> > +         *
> > +         * Legacy PIC interrupts are only targeted to CPU0, but depending 
> > on
> > +         * the platform they can be distributed to any online CPU in 
> > hardware.
> > +         * Note this behavior has only been observed on AMD hardware. In 
> > order
> > +         * to cope install all active legacy vectors on all CPUs.
> > +         *
> > +         * IO-APIC will change the destination mask if/when taking 
> > ownership of
> > +         * the interrupt.
> > +         */
> > +        cpumask_copy(desc->arch.cpu_mask, boot_cpu_data.x86_vendor &
> > +                                          (X86_VENDOR_AMD | 
> > X86_VENDOR_HYGON) ?
> > +                                          &cpumask_all : cpumask_of(cpu));
> 
> Nit: Imo this would better be wrapped as
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ?
>                      &cpumask_all : cpumask_of(cpu));
> 
> or (considering how we often prefer to wrap conditional operators)
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON)
>                      ? &cpumask_all : cpumask_of(cpu));
> 
> or
> 
>         cpumask_copy(desc->arch.cpu_mask,
>                      boot_cpu_data.x86_vendor &
>                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
>                                                          : cpumask_of(cpu));
> 
> , possibly with the argument spanning three lines additionally
> parenthesized as a whole.
> 
> (Of course an amd_like() construct like we have in the emulator would
> further help readability here, but the way that works it cannot be
> easily generalized for use outside of the emulator.)

I think the last one could be my preferred indentation, will adjust to
that.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.