[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers



On Wed, Aug 02, 2017 at 09:07:54AM -0600, Jan Beulich wrote:
> >>> Roger Pau Monne <roger.pau@xxxxxxxxxx> 06/30/17 5:01 PM >>>
> >Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
> >BIR are not trapped by Xen at the moment.
> 
> They're mandated r/o by the spec anyway.

> 
> >@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const 
> >struct vpci_bar *bar,
> >if ( IS_ERR(mem) )
> >return -PTR_ERR(mem);
>  >
> >+    /*
> >+     * Make sure the MSI-X regions of the BAR are not mapped into the domain
> >+     * p2m, or else the MSI-X handlers are useless. Only do this when 
> >mapping,
> >+     * since that's when the memory decoding on the device is enabled.
> >+     */
> >+    for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
> >+    {
> >+        struct vpci_msix_mem *msix = bar->msix[i];
> >+
> >+        if ( !msix || msix->addr == INVALID_PADDR )
> >+            continue;
> >+
> >+        rc = vpci_unmap_msix(d, msix);
> 
> Why do you need this, instead of being able to simply rely on the rangeset
> based (un)mapping?

This is because the series that I've sent called: "x86/pvh: implement
iommu_inclusive_mapping for PVH Dom0" will map the MSI-X memory areas
into the guest, and thus we need to make sure they are not mapped
here for the emulation path to work.

https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02849.html

> >@@ -405,7 +475,20 @@ static int vpci_init_bars(struct pci_dev *pdev)
> >continue;
> >}
>  >
> >-        bars[i].addr = (cmd & PCI_COMMAND_MEMORY) ? addr : INVALID_PADDR;
> >+        if ( cmd & PCI_COMMAND_MEMORY )
> >+        {
> >+            unsigned int j;
> >+
> >+            bars[i].addr = addr;
> >+
> >+            for ( j = 0; j < ARRAY_SIZE(bars[i].msix); j++ )
> >+                if ( bars[i].msix[j] )
> >+                    bars[i].msix[j]->addr = bars[i].addr +
> >+                                            bars[i].msix[j]->offset;
> >+        }
> >+        else
> >+            bars[i].addr = INVALID_PADDR;
> 
> As (I think) mentioned elsewhere already, this init-time special case looks
> dangerous (and unnecessary) to me (or else I'd expect you to also zap
> the field when the memory decode bit is being cleared).

OK, so I'm simply going to set this to addr + offset, regardless of
whether the BAR has memory decoding enabled of not. If the BAR is not
yet positioned Dom0 will have to position it anyway before enabling
memory decoding.

> >+        for ( i = 0; i < msix->max_entries; i++ )
> >+        {
> >+            if ( msix->entries[i].masked )
> >+                continue;
> 
> Why is the mask bit relevant here, but not the mask-all one?

Not taking the mask-all into account here is wrong, since setting
mask-all from 1 to 0 should force a recalculation of all the entries
address and data fields. I will fix this in the next version.

> >+            rc = vpci_msix_arch_enable(&msix->entries[i].arch, pdev,
> >+                                       msix->entries[i].addr,
> >+                                       msix->entries[i].data,
> >+                                       msix->entries[i].nr, table_base);
> >+            if ( rc )
> >+            {
> >+                gdprintk(XENLOG_ERR,
> <+                         "%04x:%02x:%02x.%u: unable to update entry %u: 
> %d\n",
> >+                         seg, bus, slot, func, i, rc);
> >+                return;
> >+            }
> >+
> >+            vpci_msix_arch_mask(&msix->entries[i].arch, pdev, false);
> 
> Same question here.

This is needed because after a vpci_msix_arch_enable the pirq is still
masked, and hence needs to be unmasked to match the guest's view.

> >+        }
> >+    }
> >+    else if ( msix->enabled && !new_enabled )
> >+    {
> >+        /* MSI-X disabled. */
> >+        for ( i = 0; i < msix->max_entries; i++ )
> >+        {
> >+            rc = vpci_msix_arch_disable(&msix->entries[i].arch, pdev);
> >+            if ( rc )
> >+            {
> >+                gdprintk(XENLOG_ERR,
> >+                         "%04x:%02x:%02x.%u: unable to disable entry %u: 
> >%d\n",
> >+                         seg, bus, slot, func, i, rc);
> >+                return;
> >+            }
> >+        }
> >+    }
> >+
> >+    if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
> >+         pci_msi_conf_write_intercept(pdev, reg, 2, &val.u32) >= 0 )
> >+        pci_conf_write16(seg, bus, slot, func, reg, val.u32);
> 
> DYM val.u16 here?

Now this is simply val, since the union has been removed.

> >+static struct vpci_msix *vpci_msix_find(struct domain *d, unsigned long 
> >addr)
> >+{
> >+    struct vpci_msix *msix;
> >+
> >+    ASSERT(vpci_locked(d));
> >+    list_for_each_entry ( msix,  &d->arch.hvm_domain.msix_tables, next )
> >+    {
> >+        uint8_t seg = msix->pdev->seg, bus = msix->pdev->bus;
> >+        uint8_t slot = PCI_SLOT(msix->pdev->devfn);
> >+        uint8_t func = PCI_FUNC(msix->pdev->devfn);
> >+        uint16_t cmd = pci_conf_read16(seg, bus, slot, func, PCI_COMMAND);
> 
> Perhaps better to keep a cached copy of the command register value?

I'm now using the enabled field of the vpci_bar struct instead of
checking the command register.

> >+        break;
> >+    case PCI_MSIX_ENTRY_DATA_OFFSET:
> >+        /*
> >+         * 8 byte writes to the msg data and vector control fields are
> >+         * only allowed if the entry is masked.
> >+         */
> >+        if ( len == 8 && !entry->masked && !msix->masked && msix->enabled )
> >+        {
> >+            vpci_unlock(d);
> >+            return X86EMUL_OKAY;
> >+        }
> 
> I don't think this is correct - iirc such writes simply don't take effect 
> immediately
> (but I then seem to recall this to apply to the address field and 32-bit 
> writes to
> the data field as well). They'd instead take effect the next time the entry 
> is being
> unmasked (or some such). A while ago I did fix the qemu code to behave in this
> way.

There's an Implementation Note called "Special Considerations for QWORD
Accesses" in the MSI-X section of the PCI 3.0 spec that states:

If a given entry is currently masked (via its Mask bit or the Function
Mask bit), software is permitted to fill in the Message Data and
Vector Control fields with a single QWORD write, taking advantage of
the fact the Message Data field is guaranteed to become visible to
hardware no later than the Vector Control field.

So I think the above chunk is correct. The specification also states
that:

Software must not modify the Address or Data fields of an entry while
it is unmasked. Refer to Section 6.8.3.5 for details.

AFAICT this is not enforced by QEMU, and you can write to the
address/data fields while the message is not masked. The update will
only take effect once the message is masked and unmasked.

> >+static int vpci_init_msix(struct pci_dev *pdev)
> >+{
> >+    struct domain *d = pdev->domain;
> >+    uint8_t seg = pdev->seg, bus = pdev->bus;
> >+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> >+    struct vpci_msix *msix;
> >+    unsigned int msix_offset, i, max_entries;
> >+    struct vpci_bar *table_bar, *pba_bar;
> >+    uint16_t control;
> >+    int rc;
> >+
> >+    msix_offset = pci_find_cap_offset(seg, bus, slot, func, 
> >PCI_CAP_ID_MSIX);
> >+    if ( !msix_offset )
> >+        return 0;
> >+
> >+    control = pci_conf_read16(seg, bus, slot, func,
> >+                              msix_control_reg(msix_offset));
> >+
> >+    /* Get the maximum number of vectors the device supports. */
> >+    max_entries = msix_table_size(control);
> >+
> >+    msix = xzalloc_bytes(MSIX_SIZE(max_entries));
> >+    if ( !msix )
> >+        return -ENOMEM;
> >+
> >+    msix->max_entries = max_entries;
> >+    msix->pdev = pdev;
> >+
> >+    /* Find the MSI-X table address. */
> >+    msix->table.offset = pci_conf_read32(seg, bus, slot, func,
> >+                                         
> >msix_table_offset_reg(msix_offset));
> >+    msix->table.bir = msix->table.offset & PCI_MSIX_BIRMASK;
> >+    msix->table.offset &= ~PCI_MSIX_BIRMASK;
> >+    msix->table.size = msix->max_entries * PCI_MSIX_ENTRY_SIZE;
> >+    msix->table.addr = INVALID_PADDR;
> >+
> >+    /* Find the MSI-X pba address. */
> >+    msix->pba.offset = pci_conf_read32(seg, bus, slot, func,
> >+                                       msix_pba_offset_reg(msix_offset));
> >+    msix->pba.bir = msix->pba.offset & PCI_MSIX_BIRMASK;
> >+    msix->pba.offset &= ~PCI_MSIX_BIRMASK;
> >+    msix->pba.size = DIV_ROUND_UP(msix->max_entries, 8);
> 
> I think you want to round up to at least the next 32-bit boundary; the
> spec talking about bits 63..00 even suggests a 64-bit boundary. The
> table addresses being required to be qword aligned also supports this.

The spec mentions that the last QWORD of the PBA doesn't need to be
fully populated, so yes, I assume this needs to be rounded up to a
64-bit boundary.

> >+void vpci_dump_msix(void)
> >+{
> >+    struct domain *d;
> >+    struct pci_dev *pdev;
> 
> const for all pointers in dump handlers, as far as possible.
> 
> >+    for_each_domain ( d )
> >+    {
> >+        if ( !has_vpci(d) )
> >+            continue;
> >+
> >+        printk("vPCI MSI-X information for guest %u\n", d->domain_id);
> 
> Wouldn't it be better (more useful) to dump the MSI and MSI-X data for a
> domain next to each other?

Possibly yes, and printing the MSI and MSI-X data of each device
together would be even better IMHO.

> Apart from the comments here the ones give for the MSI patch apply
> respectively.

I've added the MSI-X dumping to vpci_dump_msi instead.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.