[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] x86/io-apic: fix directed EOI when using AMD-Vi interrupt remapping
On Tue, Oct 29, 2024 at 05:43:24PM +0100, Jan Beulich wrote: > On 29.10.2024 12:03, Roger Pau Monne wrote: > > When using AMD-Vi interrupt remapping the vector field in the IO-APIC RTE is > > repurposed to contain part of the offset into the remapping table. > > Previous to > > 2ca9fbd739b8 Xen had logic so that the offset into the interrupt remapping > > table would match the vector. Such logic was mandatory for end of > > interrupt to > > work, since the vector field (even when not containing a vector) is used by > > the > > IO-APIC to find for which pin the EOI must be performed. > > > > Introduce a table to store the EOI handlers when using interrupt remapping, > > so > > that the IO-APIC driver can translate pins into EOI handlers without having > > to > > read the IO-APIC RTE entry. Note that to simplify the logic such table is > > used > > unconditionally when interrupt remapping is enabled, even if strictly it > > would > > only be required for AMD-Vi. > > In here I think you mean "handle" when you use "handler"? Indeed. > Plus with what you said > earlier about vector vs EOI handle, and with the code using "vector" all over > the > place, their (non-)relationship could also do with clarifying (perhaps better > in > a code comment in __io_apic_eoi()). I've attempted to clarify the relation between vector vs EOI handle in the first paragraph, and how that applies to AMD-Vi. I can move (part?) of that into the comment in __ioapic_write_entry(), maybe: /* * Might be called before io_apic_pin_eoi is allocated. Entry will be * updated once the array is allocated and there's a write against the * pin. * * Note that the vector field is only cached for raw RTE writes when * using IR. In that case the vector field might have been repurposed * to store something different than the target vector, and hence need * to be cached for performing EOI. */ > > @@ -273,6 +293,13 @@ void __ioapic_write_entry( > > { > > __io_apic_write(apic, 0x11 + 2 * pin, eu.w2); > > __io_apic_write(apic, 0x10 + 2 * pin, eu.w1); > > + /* > > + * Called in clear_IO_APIC_pin() before io_apic_pin_eoi is > > allocated. > > + * Entry will be updated once the array is allocated and there's a > > + * write against the pin. > > + */ > > + if ( io_apic_pin_eoi ) > > + io_apic_pin_eoi[apic][pin] = e.vector; > > The comment here looks a little misleading to me. clear_IO_APIC_pin() calls > here to, in particular, set the mask bit. With the mask bit the vector isn't > meaningful anyway (and indeed clear_IO_APIC_pin() sets it to zero, at which > point recording IRQ_VECTOR_UNASSIGNED might be better than the bogus vector > 0x00). Note that clear_IO_APIC_pin() performs the call to __ioapic_write_entry() with raw == false, at which point __ioapic_write_entry() will call iommu_update_ire_from_apic() if IOMMU IR is enabled. The cached 'vector' value will be the IOMMU entry offset for the AMD-Vi case, as the IOMMU code will perform the call to __ioapic_write_entry() with raw == true. What matters is that the cached value matches what's written in the IO-APIC RTE, and the current logic ensures this. What's the benefit of using IRQ_VECTOR_UNASSIGNED if the result is reading the RTE and finding that vector == 0? Looking at clear_IO_APIC_pin() - I think the function is slightly bogus. If entry.trigger is not set, the logic to switch the entry to level triggered will fetch the entry contents without requesting a raw RTE, at which point the entry.vector field can not be used as the EOI handle since it will contain the vector, not the IR table offset. I will need to make a further patch to fix this corner case. > > @@ -298,9 +325,17 @@ static void __io_apic_eoi(unsigned int apic, unsigned > > int vector, unsigned int p > > /* Prefer the use of the EOI register if available */ > > if ( ioapic_has_eoi_reg(apic) ) > > { > > + if ( io_apic_pin_eoi ) > > + vector = io_apic_pin_eoi[apic][pin]; > > + > > /* If vector is unknown, read it from the IO-APIC */ > > if ( vector == IRQ_VECTOR_UNASSIGNED ) > > + { > > vector = __ioapic_read_entry(apic, pin, true).vector; > > Related to my comment higher up regarding vector vs EOI handle: Here we're > doing a raw read, i.e. we don't really fetch the vector but the EOI handle > in the AMD case. Why is it that this isn't sufficient for directed EOI to > work (perhaps with the conditional adjusted)? It is enough, but we don't want to be doing such read for each EOI, hence why we cache it in io_apic_pin_eoi. > Then again - are we ever taking this path? Certainly not when coming from > clear_IO_APIC_pin(), hence ... > > > + if ( io_apic_pin_eoi ) > > ... I'm unconvinced this conditional is needed. Hm, maybe. I can adjust but seems more fragile to trigger a dereference for the extra cost of a conditional in what should be a non-common path anyway. > > + /* Update cached value so further EOI don't need to fetch > > it. */ > > + io_apic_pin_eoi[apic][pin] = vector; > > + } > > > > *(IO_APIC_BASE(apic)+16) = vector; > > } > > @@ -1022,8 +1057,27 @@ static void __init setup_IO_APIC_irqs(void) > > > > apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n"); > > > > + if ( iommu_intremap ) > > + { > > + io_apic_pin_eoi = xmalloc_array(typeof(*io_apic_pin_eoi), > > nr_ioapics); > > Nit: Strictly speaking this and ... > > > + BUG_ON(!io_apic_pin_eoi); > > + } > > + > > for (apic = 0; apic < nr_ioapics; apic++) { > > - for (pin = 0; pin < nr_ioapic_entries[apic]; pin++) { > > + const unsigned int nr_entries = nr_ioapic_entries[apic]; > > + > > + if ( iommu_intremap ) > > + { > > + io_apic_pin_eoi[apic] = > > xmalloc_array(typeof(**io_apic_pin_eoi), > > + nr_entries); > > ... and this should be xvmalloc_array() in new code. Sorry, didn't notice we have that now. > Also this 2nd conditional may better use io_apic_pin_eoi, such that the two > conditionals don't need keeping in sync. Note also how Andrew previously > pointed out that both conditionals aren't Misra-compliant right now. Oh, yes, completely forgot to adjust the MISRA comment from Andrew, sorry. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |