[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 20 Mar 2023 16:16:19 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HOJPe8jia1WWPd+dfZ69rAAEZpsV7YEfpvQVOUWNEKo=; b=hqz8+Vxn3VbcGOE/S3V5XWVx6srFtHHSjXQV8P5R5+1/G9JZSck67GoOk5JosA9gYBo3e5LrK4tfPtFe7+PhHJw/CPL1O1KyOokJTpcB7xkpTmI/XsPdyNap60RDz+UbXJZJ6i2PsVKjIJO4k/s6QV8gpop8BXQEWdq8ATMpHpCsuai9bEKQXg6pfjDeAts9akP0PhRp8+M/zfHWOzdCekVvwy4qtc/Gw9l8/kzpzNrZuoyuayyRvvZD72Lwb7nz/4QaWVO2H/SjsqJ91yrexHXzfrqwQFoXOBezkukVuRviClaxxS7n7rloj9FOQ5IspVzfYT/s02x6Z68ZbFDuIQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J9KkLJBoX6SPI4Orrq98w8J2DCpTA+8pQnHtac4P0Ay/f8z2RFCBsSLDMZTCw4pIT+TZ7fxc8OI3IGtego7Y0WQsy6Z4IIMjfKRhNXuiroIEeALKstHY1I70iwoFJiK424csH2191LwV+hBOBaU5I7V2rKtoEgcPhyVXSaaOGzbQ/vleE+eVN+jANZyGg58+oMM/IK/+aPvKkspMzepTTN8Ryc6YGVHKtv2eKkK8/kG+pYxaV2BC8JKaeXBjBFNOrWox314vx1LqUl1hMYwnoCE7gS3ieXQ0foTC3HLn3BmsSHbzNJDHnvv+pgqV+iZ6BCHNkTLhtYpef2RXck86BQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, Huang Rui <ray.huang@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Alex Deucher <alexander.deucher@xxxxxxx>, Christian König <christian.koenig@xxxxxxx>, Stewart Hildebrand <Stewart.Hildebrand@xxxxxxx>, Xenia Ragiadakou <burzalodowa@xxxxxxxxx>, Honglei Huang <honglei1.huang@xxxxxxx>, Julia Zhang <julia.zhang@xxxxxxx>, Chen Jiqian <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Mon, 20 Mar 2023 15:17:14 +0000
  • Ironport-data: A9a23:danonauosHiPFVLUHEdLFRqjo+fnVD9eMUV32f8akzHdYApBsoF/q tZmKWDTOKrfN2GmeI9+Pd7gpE4E6J/czdRnGQduryg3QXkU+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg3HVQ+IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj6Fv0gnRkPaoQ5ASHxiFPZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwERYAS0y5rPCP5uzlY7lcq9UvcuDrFdZK0p1g5Wmx4fcOZ7nmGvyPzvgBmTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osjv60b4G9lt+iHK25mm6Co W3L5SLhCwwyP92D0zuVtHmrg4cjmAuiANpJSuXkpqACbFu7wW0OAjQHUxyHmdqUoUevWfJ/c GoN0397xUQ13AnxJjXnZDWju2KNtBMYX9tWEsU55RuLx66S5ByWbkAGRDNcbN0ttOctWCcnk FSOmrvBHzV+9r2IQHSS3r6RoXW5Pi19BX8PY2oIQBUI5/HnoZovlVTfQ9B7Cqm3g9bpXzbqz FiipSwzl7wVgcMRkam24FvHjiiEr53FCAUy423/VWK/7xhlZYejIY+v5F7a4t5JKYrfRV6E1 FA/h8WB5foSS7GMkCCASv8EGr2B7vOJdjbbhDZHFYQ75T2p/HKkYol47zR3JUMvOcEBERfpZ 0ncvQ5QvdlTIXKsYod+Zo73AMMvpYDiCNDkX7bGbtNIbbB4cQPB9yZrDWay3nnsmU5quqEyP 7+SdMrqBnEfYZmL1xKzTuYZlLQsnyY3wDqKQYihlkv8l72DeHSSVLEJdkOUafw057+FpwOT9 MtDM8yNyFNUV+iWjjTrzLP/5GsidRATba0aYeQOHgJfCmKKwF0cNsI=
  • Ironport-hdrordr: A9a23:otjlWqperQITxtTZUSlgHicaV5u1L9V00zEX/kB9WHVpm5Oj+v xGzc5w6farsl0ssSkb6Ku90KnpewK+yXcH2/hqAV7CZnivhILMFu1fBOTZslrd8kHFl9K1kJ 0QC5SWa+eARmSS7/yKhjVQeuxIqLbozEnrv5am854Hd3AJV0gU1XYcNu/tKDwSeOApP/oEPa vZwvACiyureHwRYMj+LGICRfL/q9rCk4+jSQIaBjY8gTP+wA+A2frfKVy1zx0eWzRAzfMJ6m 7eiTH04a2lrrWS1gLc7WnO9J5b8eGRhuerRfb8xfT9GA+cyzpAV74RG4FqiQpF7t1Hpmxa0+ Uk6C1QQvibo0mhAl1d5yGdkTUImQxelEPK2BuWh2Durtf+Qy9/A81dhZhBeh+c8EY4uspguZ g7qF5xmqAneC8oph6Nk+TgRlVvjA65sHAimekcgzhWVpYfcqZYqcga8FlOGJkNESrm4MR/ed MeRf309bJTaxeXfnrZtm5gzJilWWkyBA6PRgwHttaO2zZbkXhlxw8TxdAZnH0H6JUhIqM0kd jsI+BtjvVDX8UWZaVyCKMIRta2EHXERVbWPGebMT3cZdM60rL22u3KCZkOlZGXkcYzveUPcb z6ISJliVI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > > >> So yes, it then all boils down to that Linux-
> > > > > >> internal question.
> > > > > > 
> > > > > > Excellent question but we'll have to wait for Ray as he is the one 
> > > > > > with
> > > > > > access to the hardware. But I have this data I can share in the
> > > > > > meantime:
> > > > > > 
> > > > > > [    1.260378] IRQ to pin mappings:
> > > > > > [    1.260387] IRQ1 -> 0:1
> > > > > > [    1.260395] IRQ2 -> 0:2
> > > > > > [    1.260403] IRQ3 -> 0:3
> > > > > > [    1.260410] IRQ4 -> 0:4
> > > > > > [    1.260418] IRQ5 -> 0:5
> > > > > > [    1.260425] IRQ6 -> 0:6
> > > > > > [    1.260432] IRQ7 -> 0:7
> > > > > > [    1.260440] IRQ8 -> 0:8
> > > > > > [    1.260447] IRQ9 -> 0:9
> > > > > > [    1.260455] IRQ10 -> 0:10
> > > > > > [    1.260462] IRQ11 -> 0:11
> > > > > > [    1.260470] IRQ12 -> 0:12
> > > > > > [    1.260478] IRQ13 -> 0:13
> > > > > > [    1.260485] IRQ14 -> 0:14
> > > > > > [    1.260493] IRQ15 -> 0:15
> > > > > > [    1.260505] IRQ106 -> 1:8
> > > > > > [    1.260513] IRQ112 -> 1:4
> > > > > > [    1.260521] IRQ116 -> 1:13
> > > > > > [    1.260529] IRQ117 -> 1:14
> > > > > > [    1.260537] IRQ118 -> 1:15
> > > > > > [    1.260544] .................................... done.
> > > > > 
> > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > > Linux running baremetal on the same hardware?
> > > > 
> > > > So I have some emails from Ray from he time he was looking into this,
> > > > and on Linux dom0 PVH dmesg there is:
> > > > 
> > > > [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, 
> > > > GSI 0-23
> > > > [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, 
> > > > GSI 24-55
> > > > 
> > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > > consistent.
> > > >  
> > > > > > And I think Ray traced the point in Linux where Linux gives us an 
> > > > > > IRQ ==
> > > > > > 112 (which is the one causing issues):
> > > > > > 
> > > > > > __acpi_register_gsi->
> > > > > >         acpi_register_gsi_ioapic->
> > > > > >                 mp_map_gsi_to_irq->
> > > > > >                         mp_map_pin_to_irq->
> > > > > >                                 __irq_resolve_mapping()
> > > > > > 
> > > > > >         if (likely(data)) {
> > > > > >                 desc = irq_data_to_desc(data);
> > > > > >                 if (irq)
> > > > > >                         *irq = data->irq;
> > > > > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > > > > >         }
> > > > 
> > > > 
> > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > > 
> > > > FWIW, the introduced function in that patch
> > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > > registered, which might lead to multiple IRQs being allocated for the
> > > > same underlying GSI?
> > > 
> > > I understand this point and I think it needs investigating.
> > > 
> > > 
> > > > As I commented there, I think that approach is wrong.  If the GSI has
> > > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > > before attempting to bind.
> > > 
> > > But this statement confuses me. The toolstack doesn't get involved in
> > > IRQ setup for PCI devices for HVM guests?
> > 
> > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> > to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> > cold be removed (maybe for qemu-trad only?) or it's also required by
> > QEMU upstream, I would have to investigate more.
> 
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
> 
> 
> > It's my understanding it's in pci_add_dm_done() where Ray was getting
> > the mismatched IRQ vs GSI number.
> 
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.

I've been thinking about this a bit, and I think one of the possible
issues with the current handling of GSIs in a PVH dom0 is that GSIs
don't get registered until/unless they are unmasked.  I could see this
as a problem when doing passthrough: it's possible for a GSI (iow:
vIO-APIC pin) to never get unmasked on dom0, because the device
driver(s) are using MSI(-X) interrupts instead.  However, the IO-APIC
pin must be configured for it to be able to be mapped into a domU.

A possible solution is to propagate the vIO-APIC pin configuration
trigger/polarity when dom0 writes the low part of the redirection
table entry.

The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH
domains (I need to assert this is secure even for domUs) and also
propagates the vIO-APIC pin trigger/polarity mode on writes to the
low part of the RTE.  Such propagation leads to the following
interrupt setup in Xen:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} 
arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   2 vec:a8 IO-APIC-edge    status=000 aff:{0-7}/{0-7} no_action()
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} 
drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0-7}/{0} mapped, unbound
IRQ:   8 vec:60 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:68 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  16 vec:b0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-)
IRQ:  17 vec:b8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  18 vec:c0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  19 vec:c8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  20 vec:d0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-)
IRQ:  21 vec:d8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  22 vec:e0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  23 vec:e8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound

Note how now all GSIs on my box are setup, even when not bound to
dom0 anymore.  The output without this patch looks like:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} 
arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} 
drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0}/{0} mapped, unbound
IRQ:   8 vec:d0 IO-APIC-edge    status=010 aff:{6}/{0-7} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:a8 IO-APIC-level   status=010 aff:{2}/{0-7} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  16 vec:e0 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 
16(-M-),d1: 16(-M-)
IRQ:  20 vec:d8 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-)

Legacy IRQs (below 16) are always registered.

With the patch above I seem to be able to do PCI passthrough to an HVM
domU from a PVH dom0.

Regards, Roger.

---
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 405d0a95af..cc53a3bd12 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..50e23a093c 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned 
int trig,
 
     /* Interrupt has been unmasked, bind it now. */
     ret = mp_register_gsi(gsi, trig, pol);
-    if ( ret == -EEXIST )
-        return 0;
-    if ( ret )
+    if ( ret && ret != -EEXIST )
     {
         gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
                  gsi, ret);
@@ -244,12 +242,18 @@ static void vioapic_write_redirent(
     }
     else
     {
+        int ret;
+
         unmasked = ent.fields.mask;
         /* Remote IRR and Delivery Status are read-only. */
         ent.bits = ((ent.bits >> 32) << 32) | val;
         ent.fields.delivery_status = 0;
         ent.fields.remote_irr = pent->fields.remote_irr;
         unmasked = unmasked && !ent.fields.mask;
+        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
+        if ( ret && ret !=  -EEXIST )
+            gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
+                    gsi, ret);
     }
 
     *pent = ent;




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.