|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] x86/io-apic: fix directed EOI when using AMD-Vi interrupt remapping
On Tue, Oct 29, 2024 at 05:43:24PM +0100, Jan Beulich wrote:
> On 29.10.2024 12:03, Roger Pau Monne wrote:
> > When using AMD-Vi interrupt remapping the vector field in the IO-APIC RTE is
> > repurposed to contain part of the offset into the remapping table.
> > Previous to
> > 2ca9fbd739b8 Xen had logic so that the offset into the interrupt remapping
> > table would match the vector. Such logic was mandatory for end of
> > interrupt to
> > work, since the vector field (even when not containing a vector) is used by
> > the
> > IO-APIC to find for which pin the EOI must be performed.
> >
> > Introduce a table to store the EOI handlers when using interrupt remapping,
> > so
> > that the IO-APIC driver can translate pins into EOI handlers without having
> > to
> > read the IO-APIC RTE entry. Note that to simplify the logic such table is
> > used
> > unconditionally when interrupt remapping is enabled, even if strictly it
> > would
> > only be required for AMD-Vi.
>
> In here I think you mean "handle" when you use "handler"?
Indeed.
> Plus with what you said
> earlier about vector vs EOI handle, and with the code using "vector" all over
> the
> place, their (non-)relationship could also do with clarifying (perhaps better
> in
> a code comment in __io_apic_eoi()).
I've attempted to clarify the relation between vector vs EOI handle in
the first paragraph, and how that applies to AMD-Vi. I can move
(part?) of that into the comment in __ioapic_write_entry(), maybe:
/*
* Might be called before io_apic_pin_eoi is allocated. Entry will be
* updated once the array is allocated and there's a write against the
* pin.
*
* Note that the vector field is only cached for raw RTE writes when
* using IR. In that case the vector field might have been repurposed
* to store something different than the target vector, and hence need
* to be cached for performing EOI.
*/
> > @@ -273,6 +293,13 @@ void __ioapic_write_entry(
> > {
> > __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
> > __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
> > + /*
> > + * Called in clear_IO_APIC_pin() before io_apic_pin_eoi is
> > allocated.
> > + * Entry will be updated once the array is allocated and there's a
> > + * write against the pin.
> > + */
> > + if ( io_apic_pin_eoi )
> > + io_apic_pin_eoi[apic][pin] = e.vector;
>
> The comment here looks a little misleading to me. clear_IO_APIC_pin() calls
> here to, in particular, set the mask bit. With the mask bit the vector isn't
> meaningful anyway (and indeed clear_IO_APIC_pin() sets it to zero, at which
> point recording IRQ_VECTOR_UNASSIGNED might be better than the bogus vector
> 0x00).
Note that clear_IO_APIC_pin() performs the call to
__ioapic_write_entry() with raw == false, at which point
__ioapic_write_entry() will call iommu_update_ire_from_apic() if IOMMU
IR is enabled. The cached 'vector' value will be the IOMMU entry
offset for the AMD-Vi case, as the IOMMU code will perform the call to
__ioapic_write_entry() with raw == true.
What matters is that the cached value matches what's written in the
IO-APIC RTE, and the current logic ensures this.
What's the benefit of using IRQ_VECTOR_UNASSIGNED if the result is
reading the RTE and finding that vector == 0?
Looking at clear_IO_APIC_pin() - I think the function is slightly
bogus. If entry.trigger is not set, the logic to switch the entry to
level triggered will fetch the entry contents without requesting a
raw RTE, at which point the entry.vector field can not be used as
the EOI handle since it will contain the vector, not the IR table
offset. I will need to make a further patch to fix this corner
case.
> > @@ -298,9 +325,17 @@ static void __io_apic_eoi(unsigned int apic, unsigned
> > int vector, unsigned int p
> > /* Prefer the use of the EOI register if available */
> > if ( ioapic_has_eoi_reg(apic) )
> > {
> > + if ( io_apic_pin_eoi )
> > + vector = io_apic_pin_eoi[apic][pin];
> > +
> > /* If vector is unknown, read it from the IO-APIC */
> > if ( vector == IRQ_VECTOR_UNASSIGNED )
> > + {
> > vector = __ioapic_read_entry(apic, pin, true).vector;
>
> Related to my comment higher up regarding vector vs EOI handle: Here we're
> doing a raw read, i.e. we don't really fetch the vector but the EOI handle
> in the AMD case. Why is it that this isn't sufficient for directed EOI to
> work (perhaps with the conditional adjusted)?
It is enough, but we don't want to be doing such read for each EOI,
hence why we cache it in io_apic_pin_eoi.
> Then again - are we ever taking this path? Certainly not when coming from
> clear_IO_APIC_pin(), hence ...
>
> > + if ( io_apic_pin_eoi )
>
> ... I'm unconvinced this conditional is needed.
Hm, maybe. I can adjust but seems more fragile to trigger a
dereference for the extra cost of a conditional in what should be a
non-common path anyway.
> > + /* Update cached value so further EOI don't need to fetch
> > it. */
> > + io_apic_pin_eoi[apic][pin] = vector;
> > + }
> >
> > *(IO_APIC_BASE(apic)+16) = vector;
> > }
> > @@ -1022,8 +1057,27 @@ static void __init setup_IO_APIC_irqs(void)
> >
> > apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
> >
> > + if ( iommu_intremap )
> > + {
> > + io_apic_pin_eoi = xmalloc_array(typeof(*io_apic_pin_eoi),
> > nr_ioapics);
>
> Nit: Strictly speaking this and ...
>
> > + BUG_ON(!io_apic_pin_eoi);
> > + }
> > +
> > for (apic = 0; apic < nr_ioapics; apic++) {
> > - for (pin = 0; pin < nr_ioapic_entries[apic]; pin++) {
> > + const unsigned int nr_entries = nr_ioapic_entries[apic];
> > +
> > + if ( iommu_intremap )
> > + {
> > + io_apic_pin_eoi[apic] =
> > xmalloc_array(typeof(**io_apic_pin_eoi),
> > + nr_entries);
>
> ... and this should be xvmalloc_array() in new code.
Sorry, didn't notice we have that now.
> Also this 2nd conditional may better use io_apic_pin_eoi, such that the two
> conditionals don't need keeping in sync. Note also how Andrew previously
> pointed out that both conditionals aren't Misra-compliant right now.
Oh, yes, completely forgot to adjust the MISRA comment from Andrew,
sorry.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |