[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > > >> So yes, it then all boils down to that Linux- > > > > > >> internal question. > > > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one > > > > > > with > > > > > > access to the hardware. But I have this data I can share in the > > > > > > meantime: > > > > > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > > > [ 1.260387] IRQ1 -> 0:1 > > > > > > [ 1.260395] IRQ2 -> 0:2 > > > > > > [ 1.260403] IRQ3 -> 0:3 > > > > > > [ 1.260410] IRQ4 -> 0:4 > > > > > > [ 1.260418] IRQ5 -> 0:5 > > > > > > [ 1.260425] IRQ6 -> 0:6 > > > > > > [ 1.260432] IRQ7 -> 0:7 > > > > > > [ 1.260440] IRQ8 -> 0:8 > > > > > > [ 1.260447] IRQ9 -> 0:9 > > > > > > [ 1.260455] IRQ10 -> 0:10 > > > > > > [ 1.260462] IRQ11 -> 0:11 > > > > > > [ 1.260470] IRQ12 -> 0:12 > > > > > > [ 1.260478] IRQ13 -> 0:13 > > > > > > [ 1.260485] IRQ14 -> 0:14 > > > > > > [ 1.260493] IRQ15 -> 0:15 > > > > > > [ 1.260505] IRQ106 -> 1:8 > > > > > > [ 1.260513] IRQ112 -> 1:4 > > > > > > [ 1.260521] IRQ116 -> 1:13 > > > > > > [ 1.260529] IRQ117 -> 1:14 > > > > > > [ 1.260537] IRQ118 -> 1:15 > > > > > > [ 1.260544] .................................... done. > > > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > > Linux running baremetal on the same hardware? > > > > > > > > So I have some emails from Ray from he time he was looking into this, > > > > and on Linux dom0 PVH dmesg there is: > > > > > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, > > > > GSI 0-23 > > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, > > > > GSI 24-55 > > > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > > consistent. > > > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an > > > > > > IRQ == > > > > > > 112 (which is the one causing issues): > > > > > > > > > > > > __acpi_register_gsi-> > > > > > > acpi_register_gsi_ioapic-> > > > > > > mp_map_gsi_to_irq-> > > > > > > mp_map_pin_to_irq-> > > > > > > __irq_resolve_mapping() > > > > > > > > > > > > if (likely(data)) { > > > > > > desc = irq_data_to_desc(data); > > > > > > if (irq) > > > > > > *irq = data->irq; > > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > > } > > > > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > > > FWIW, the introduced function in that patch > > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > > registered, which might lead to multiple IRQs being allocated for the > > > > same underlying GSI? > > > > > > I understand this point and I think it needs investigating. > > > > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > > before attempting to bind. > > > > > > But this statement confuses me. The toolstack doesn't get involved in > > > IRQ setup for PCI devices for HVM guests? > > > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > > cold be removed (maybe for qemu-trad only?) or it's also required by > > QEMU upstream, I would have to investigate more. > > You are right. I am not certain, but it seems like a mistake in the > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > guests, not for HVM guests. I am not sure. But I can see the call to > xc_physdev_map_pirq you were referring to now. > > > > It's my understanding it's in pci_add_dm_done() where Ray was getting > > the mismatched IRQ vs GSI number. > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > from QEMU, which makes sense because in any case it should happen before > the same call done by pci_add_dm_done (pci_add_dm_done is called after > sending the pci passthrough QMP command to QEMU). So the first to hit > the IRQ!=GSI problem would be QEMU. I've been thinking about this a bit, and I think one of the possible issues with the current handling of GSIs in a PVH dom0 is that GSIs don't get registered until/unless they are unmasked. I could see this as a problem when doing passthrough: it's possible for a GSI (iow: vIO-APIC pin) to never get unmasked on dom0, because the device driver(s) are using MSI(-X) interrupts instead. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. A possible solution is to propagate the vIO-APIC pin configuration trigger/polarity when dom0 writes the low part of the redirection table entry. The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH domains (I need to assert this is secure even for domUs) and also propagates the vIO-APIC pin trigger/polarity mode on writes to the low part of the RTE. Such propagation leads to the following interrupt setup in Xen: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 2 vec:a8 IO-APIC-edge status=000 aff:{0-7}/{0-7} no_action() IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0-7}/{0} mapped, unbound IRQ: 8 vec:60 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 8(-M-) IRQ: 9 vec:68 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 16 vec:b0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-) IRQ: 17 vec:b8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 18 vec:c0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 19 vec:c8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 20 vec:d0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-) IRQ: 21 vec:d8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 22 vec:e0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 23 vec:e8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound Note how now all GSIs on my box are setup, even when not bound to dom0 anymore. The output without this patch looks like: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0}/{0} mapped, unbound IRQ: 8 vec:d0 IO-APIC-edge status=010 aff:{6}/{0-7} in-flight=0 d0: 8(-M-) IRQ: 9 vec:a8 IO-APIC-level status=010 aff:{2}/{0-7} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 16 vec:e0 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-) IRQ: 20 vec:d8 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-) Legacy IRQs (below 16) are always registered. With the patch above I seem to be able to do PCI passthrough to an HVM domU from a PVH dom0. Regards, Roger. --- diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..cc53a3bd12 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: + break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 41e3c4d5e4..50e23a093c 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, /* Interrupt has been unmasked, bind it now. */ ret = mp_register_gsi(gsi, trig, pol); - if ( ret == -EEXIST ) - return 0; - if ( ret ) + if ( ret && ret != -EEXIST ) { gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", gsi, ret); @@ -244,12 +242,18 @@ static void vioapic_write_redirent( } else { + int ret; + unmasked = ent.fields.mask; /* Remote IRR and Delivery Status are read-only. */ ent.bits = ((ent.bits >> 32) << 32) | val; ent.fields.delivery_status = 0; ent.fields.remote_irr = pent->fields.remote_irr; unmasked = unmasked && !ent.fields.mask; + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); + if ( ret && ret != -EEXIST ) + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", + gsi, ret); } *pent = ent;
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |