[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/io-apic: fix directed EOI when using AMd-Vi interrupt remapping



On Mon, Oct 21, 2024 at 12:10:14PM +0100, Andrew Cooper wrote:
> On 18/10/2024 9:08 am, Roger Pau Monne wrote:
> > When using AMD-VI interrupt remapping the vector field in the IO-APIC RTE is
> > repurposed to contain part of the offset into the remapping table.  
> > Previous to
> > 2ca9fbd739b8 Xen had logic so that the offset into the interrupt remapping
> > table would match the vector.  Such logic was mandatory for end of 
> > interrupt to
> > work, since the vector field (even when not containing a vector) is used by 
> > the
> > IO-APIC to find for which pin the EOI must be performed.
> >
> > Introduce a table to store the EOI handlers when using interrupt remapping, 
> > so
> > that the IO-APIC driver can translate pins into EOI handlers without having 
> > to
> > read the IO-APIC RTE entry.  Note that to simplify the logic such table is 
> > used
> > unconditionally when interrupt remapping is enabled, even if strictly it 
> > would
> > only be required for AMD-Vi.
> >
> > Reported-by: Willi Junga <xenproject@xxxxxx>
> > Suggested-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
> > Fixes: 2ca9fbd739b8 ('AMD IOMMU: allocate IRTE entries instead of using a 
> > static mapping')
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> 
> Yet more fallout from the multi-MSI work.  That really has been a giant
> source of bugs.
> 
> > ---
> >  xen/arch/x86/io_apic.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> > index e40d2f7dbd75..8856eb29d275 100644
> > --- a/xen/arch/x86/io_apic.c
> > +++ b/xen/arch/x86/io_apic.c
> > @@ -71,6 +71,22 @@ static int apic_pin_2_gsi_irq(int apic, int pin);
> >  
> >  static vmask_t *__read_mostly vector_map[MAX_IO_APICS];
> >  
> > +/*
> > + * Store the EOI handle when using interrupt remapping.
> > + *
> > + * If using AMD-Vi interrupt remapping the IO-APIC redirection entry 
> > remapped
> > + * format repurposes the vector field to store the offset into the 
> > Interrupt
> > + * Remap table.  This causes directed EOI to longer work, as the CPU 
> > vector no
> > + * longer matches the contents of the RTE vector field.  Add a translation
> > + * table so that directed EOI uses the value in the RTE vector field when
> > + * interrupt remapping is enabled.
> > + *
> > + * Note Intel VT-d Xen code still stores the CPU vector in the RTE vector 
> > field
> > + * when using the remapped format, but use the translation table uniformly 
> > in
> > + * order to avoid extra logic to differentiate between VT-d and AMD-Vi.
> > + */
> > +static unsigned int **apic_pin_eoi;
> 
> I think we can get away with this being uint8_t rather than unsigned
> int, especially as we're allocating memory when not strictly necessary.
> 
> The only sentinel value we use is IRQ_VECTOR_UNASSIGNED which is -1.
> 
> Vector 0xff is strictly SPIV and not allocated for anything else, so can
> be reused as a suitable sentinel here.

The coding style explicitly discourages using fixed width types unless
it's strictly necessary, I assume the usage here would be covered by
Xen caching a value of a hardware register field that has a
fixed-width size.

> > +
> >  static void share_vector_maps(unsigned int src, unsigned int dst)
> >  {
> >      unsigned int pin;
> > @@ -273,6 +289,13 @@ void __ioapic_write_entry(
> >      {
> >          __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
> >          __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
> > +        /*
> > +         * Might be called before apic_pin_eoi is allocated.  Entry will be
> > +         * updated once the array is allocated and there's an EOI or write
> > +         * against the pin.
> > +         */
> 
> Is this for the xAPIC path where we turn on interrupts before the IOMMU ?

It's for iommu_setup() -> iommu_hardware_setup() saving and restoring
the IO-APIC entries around enabling of interrupt remapping.  This is
done just ahead of smp_prepare_cpus() which is where
setup_IO_APIC_irqs() gets called.

> > +        if ( apic_pin_eoi )
> > +            apic_pin_eoi[apic][pin] = e.vector;
> >      }
> >      else
> >          iommu_update_ire_from_apic(apic, pin, e.raw);
> > @@ -298,9 +321,17 @@ static void __io_apic_eoi(unsigned int apic, unsigned 
> > int vector, unsigned int p
> >      /* Prefer the use of the EOI register if available */
> >      if ( ioapic_has_eoi_reg(apic) )
> >      {
> > +        if ( apic_pin_eoi )
> > +            vector = apic_pin_eoi[apic][pin];
> > +
> >          /* If vector is unknown, read it from the IO-APIC */
> >          if ( vector == IRQ_VECTOR_UNASSIGNED )
> > +        {
> >              vector = __ioapic_read_entry(apic, pin, true).vector;
> > +            if ( apic_pin_eoi )
> > +                /* Update cached value so further EOI don't need to fetch 
> > it. */
> > +                apic_pin_eoi[apic][pin] = vector;
> > +        }
> >  
> >          *(IO_APIC_BASE(apic)+16) = vector;
> >      }
> > @@ -1022,7 +1053,23 @@ static void __init setup_IO_APIC_irqs(void)
> >  
> >      apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
> >  
> > +    if ( iommu_intremap )
> 
> MISRA requires this to be iommu_intremap != iommu_intremap_off.
> 
> But, if this safe on older hardware?  iommu_intremap defaults to on
> (full), and is then turned off later on boot for various reasons.

I think it's fine because setup_IO_APIC_irqs() is strictly called
after iommu_setup(), so the value of iommu_intremap by that point
should reflect whether IR is enabled.

> We do all memory allocations in setup_IO_APIC_irqs() so at least we get
> to see a consistent view of iommu_intremap.
> 
> I suppose there's nothing wrong with having an extra cache of the vector
> in the way when not using interrupt remapping, so maybe it's fine?
> 
> > +    {
> > +        apic_pin_eoi = xzalloc_array(typeof(*apic_pin_eoi), nr_ioapics);
> > +        BUG_ON(!apic_pin_eoi);
> > +    }
> > +
> >      for (apic = 0; apic < nr_ioapics; apic++) {
> > +        if ( iommu_intremap )
> > +        {
> > +            apic_pin_eoi[apic] = xmalloc_array(typeof(**apic_pin_eoi),
> > +                                               nr_ioapic_entries[apic]);
> > +            BUG_ON(!apic_pin_eoi[apic]);
> > +
> > +            for ( pin = 0; pin < nr_ioapic_entries[apic]; pin++ )
> > +                apic_pin_eoi[apic][pin] = IRQ_VECTOR_UNASSIGNED;
> > +        }
> 
> This logic will be better if you pull nr_ioapic_entries[apic] out into a
> loop-local variable.
> 
> It should also allow the optimiser to turn the for loop into a memset(),
> which it can't now because of possible pointer aliasing with the
> induction variable.

Oh, OK, can send v2 with that adjusted.

> But overall, the patch looks broadly ok to me.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.