[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev



On Fri, Feb 09, 2024 at 03:05:49PM -0600, Bjorn Helgaas wrote:
> On Thu, Feb 01, 2024 at 09:39:49AM +0100, Roger Pau Monné wrote:
> > On Wed, Jan 31, 2024 at 01:00:14PM -0600, Bjorn Helgaas wrote:
> > > On Wed, Jan 31, 2024 at 09:58:19AM +0100, Roger Pau Monné wrote:
> > > > On Tue, Jan 30, 2024 at 02:44:03PM -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Jan 30, 2024 at 10:07:36AM +0100, Roger Pau Monné wrote:
> > > > > > On Mon, Jan 29, 2024 at 04:01:13PM -0600, Bjorn Helgaas wrote:
> > > > > > > On Thu, Jan 25, 2024 at 07:17:24AM +0000, Chen, Jiqian wrote:
> > > > > > > > On 2024/1/24 00:02, Bjorn Helgaas wrote:
> > > > > > > > > On Tue, Jan 23, 2024 at 10:13:52AM +0000, Chen, Jiqian wrote:
> > > > > > > > >> On 2024/1/23 07:37, Bjorn Helgaas wrote:
> > > > > > > > >>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
> > > > > > > > >>>> There is a need for some scenarios to use gsi sysfs.
> > > > > > > > >>>> For example, when xen passthrough a device to dumU, it will
> > > > > > > > >>>> use gsi to map pirq, but currently userspace can't get gsi
> > > > > > > > >>>> number.
> > > > > > > > >>>> So, add gsi sysfs for that and for other potential 
> > > > > > > > >>>> scenarios.
> > > > > > > > >> ...
> > > > > > > > > 
> > > > > > > > >>> I don't know enough about Xen to know why it needs the GSI 
> > > > > > > > >>> in
> > > > > > > > >>> userspace.  Is this passthrough brand new functionality 
> > > > > > > > >>> that can't be
> > > > > > > > >>> done today because we don't expose the GSI yet?
> > > > > > > 
> > > > > > > I assume this must be new functionality, i.e., this kind of
> > > > > > > passthrough does not work today, right?
> > > > > > > 
> > > > > > > > >> has ACPI support and is responsible for detecting and 
> > > > > > > > >> controlling
> > > > > > > > >> the hardware, also it performs privileged operations such as 
> > > > > > > > >> the
> > > > > > > > >> creation of normal (unprivileged) domains DomUs. When we 
> > > > > > > > >> give to a
> > > > > > > > >> DomU direct access to a device, we need also to route the 
> > > > > > > > >> physical
> > > > > > > > >> interrupts to the DomU. In order to do so Xen needs to setup 
> > > > > > > > >> and map
> > > > > > > > >> the interrupts appropriately.
> > > > > > > > > 
> > > > > > > > > What kernel interfaces are used for this setup and mapping?
> > > > > > > >
> > > > > > > > For passthrough devices, the setup and mapping of routing 
> > > > > > > > physical
> > > > > > > > interrupts to DomU are done on Xen hypervisor side, hypervisor 
> > > > > > > > only
> > > > > > > > need userspace to provide the GSI info, see Xen code:
> > > > > > > > xc_physdev_map_pirq require GSI and then will call hypercall to 
> > > > > > > > pass
> > > > > > > > GSI into hypervisor and then hypervisor will do the mapping and
> > > > > > > > routing, kernel doesn't do the setup and mapping.
> > > > > > > 
> > > > > > > So we have to expose the GSI to userspace not because userspace 
> > > > > > > itself
> > > > > > > uses it, but so userspace can turn around and pass it back into 
> > > > > > > the
> > > > > > > kernel?
> > > > > > 
> > > > > > No, the point is to pass it back to Xen, which doesn't know the
> > > > > > mapping between GSIs and PCI devices because it can't execute the 
> > > > > > ACPI
> > > > > > AML resource methods that provide such information.
> > > > > > 
> > > > > > The (Linux) kernel is just a proxy that forwards the hypercalls from
> > > > > > user-space tools into Xen.
> > > > > 
> > > > > But I guess Xen knows how to interpret a GSI even though it doesn't
> > > > > have access to AML?
> > > > 
> > > > On x86 Xen does know how to map a GSI into an IO-APIC pin, in order
> > > > configure the RTE as requested.
> > > 
> > > IIUC, mapping a GSI to an IO-APIC pin requires information from the
> > > MADT.  So I guess Xen does use the static ACPI tables, but not the AML
> > > _PRT methods that would connect a GSI with a PCI device?
> > 
> > Yes, Xen can parse the static tables, and knows the base GSI of
> > IO-APICs from the MADT.
> > 
> > > I guess this means Xen would not be able to deal with _MAT methods,
> > > which also contains MADT entries?  I don't know the implications of
> > > this -- maybe it means Xen might not be able to use with hot-added
> > > devices?
> > 
> > It's my understanding _MAT will only be present on some very specific
> > devices (IO-APIC or CPU objects).  Xen doesn't support hotplug of
> > IO-APICs, but hotplug of CPUs should in principle be supported with
> > cooperation from the control domain OS (albeit it's not something that
> > we tests on our CI).  I don't expect however that a CPU object _MAT
> > method will return IO APIC entries.
> > 
> > > The tables (including DSDT and SSDTS that contain the AML) are exposed
> > > to userspace via /sys/firmware/acpi/tables/, but of course that
> > > doesn't mean Xen knows how to interpret the AML, and even if it did,
> > > Xen probably wouldn't be able to *evaluate* it since that could
> > > conflict with the host kernel's use of AML.
> > 
> > Indeed, there can only be a single OSPM, and that's the dom0 OS (Linux
> > in our context).
> > 
> > Getting back to our context though, what would be a suitable place for
> > exposing the GSI assigned to each device?
> 
> IIUC, the Xen hypervisor:
> 
>   - Interprets /sys/firmware/acpi/tables/APIC (or gets this via
>     something running on the Dom0 kernel) to find the physical base
>     address and GSI base, e.g., from I/O APIC, I/O SAPIC.

No, Xen parses the MADT directly from memory, before stating dom0.
That's a static table so it's fine for Xen to parse it and obtain the
I/O APIC GSI base.

>   - Needs the GSI to locate the APIC and pin within the APIC.  The
>     Dom0 kernel is the OSPM, so only it can evaluate the AML _PRT to
>     learn the PCI device -> GSI mapping.

Yes, Xen doesn't know the PCI device -> GSI mapping.  Dom0 needs to
parse the ACPI methods and signal Xen to configure a GSI with a
given trigger and polarity.

>   - Has direct access to the APIC physical base address to program the
>     Redirection Table.

Yes, the hardware (native) I/O APIC is owned by Xen, and not directly
accessible by dom0.

> The patch seems a little messy to me because the PCI core has to keep
> track of the GSI even though it doesn't need it itself.  And the
> current patch exposes it on all arches, even non-ACPI ones or when
> ACPI is disabled (easily fixable).
> 
> We only call acpi_pci_irq_enable() in the pci_enable_device() path, so
> we don't know the GSI unless a Dom0 driver has claimed the device and
> called pci_enable_device() for it, which seems like it might not be
> desirable.

I think that's always the case, as on dom0 devices to be passed
through are handled by pciback which does enable them.

I agree it might be best to not tie exposing the node to
pci_enable_device() having been called.  Is _PRT only evaluated as
part of acpi_pci_irq_enable()? (or pci_enable_device()).

> I was hoping we could put it in /sys/firmware/acpi/interrupts, but
> that looks like it's only for SCI statistics.  I guess we could moot a
> new /sys/firmware/acpi/gsi/ directory, but then each file there would
> have to identify a device, which might not be as convenient as the
> /sys/devices/ directory that already exists.  I guess there may be
> GSIs for things other than PCI devices; will you ever care about any
> of those?

We only support passthrough of PCI devices so far, but I guess if any
of such non-PCI devices ever appear and those use a GSI, and Xen
supports passthrough for them, then yes, we would need to fetch such
GSI somehow.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.