[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 for-next 12/12] vpci/msix: add MSI-X handlers



>>> On 18.10.17 at 13:40, <roger.pau@xxxxxxxxxx> wrote:
> Changes since v6:
>  - Reduce the output of the debug keys.
>  - Fix comments and code to match in vpci_msix_control_write.
>  - Optimize size of the MSIX structure.
>  - Convert vpci_msix_mem to a uint32_t in order to reduce the size of
>    vpci_msix. Introduce some macros to make it easier to get the MSIX
>    tables related data.

This is misleading - there's no vpci_msix_mem in the patch. I think
you mean the tables[] member of struct vpci_msix?

> +void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry)
> +{
> +    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu mask=%u pirq: %d\n",

Aiui %#02x is pointless as the 0x already occupies both characters.
Please drop the #; imo vectors make no sense to be logged in decimal
anyway.

> @@ -254,6 +255,20 @@ static void modify_bars(const struct pci_dev *pdev, bool 
> map, bool rom)
>          }
>      }
>  
> +    /* Remove any MSIX regions if present. */
> +    for ( i = 0; msix && i < ARRAY_SIZE(msix->tables); i++ )
> +    {
> +        paddr_t start = VMSIX_TABLE_ADDR(pdev->vpci, i);
> +
> +        rc = rangeset_remove_range(mem, PFN_DOWN(start), PFN_DOWN(start +
> +                                   VMSIX_TABLE_SIZE(pdev->vpci, i) - 1));

Bad indentation. Also if the second PFN_DOWN() is really intended
this way (rather than PFN_UP()), please add a comment explaining
why that is.

> +        if ( rc )
> +        {
> +            rangeset_destroy(mem);
> +            return;

Silent discarding of an error also needs an explanation in a comment.

> @@ -325,6 +329,22 @@ void vpci_dump_msi(void)
>                  vpci_msi_arch_print(msi);
>              }
>  
> +            if ( msix )
> +            {
> +                unsigned int i;
> +
> +                printk(" MSI-X\n");
> +
> +                printk("  entries: %u maskall: %d enabled: %d\n",
> +                       msix->max_entries, msix->masked, msix->enabled);
> +
> +                for ( i = 0; i < msix->max_entries; i++ )
> +                {
> +                    printk("  %4u ", i);
> +                    vpci_msix_arch_print_entry(&msix->entries[i]);
> +                }

Printing a partial line (i.e. w/o newline) and then calling an arch-
dependent function is not ideal. Perhaps bets to also hand the
index to the function, so it can decide how best to print
everything. Also, btw, "  %4" can be had with less .rodata
consumption by using "%6u".

> +static void control_write(const struct pci_dev *pdev, unsigned int reg,
> +                          uint32_t val, void *data)
> +{
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    struct vpci_msix *msix = data;
> +    bool new_masked = val & PCI_MSIX_FLAGS_MASKALL;
> +    bool new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
> +    unsigned int i;
> +    int rc;
> +
> +    if ( new_masked == msix->masked && new_enabled == msix->enabled )
> +        return;
> +
> +    /*
> +     * According to the PCI 3.0 specification, switching the enable bit to 
> 1
> +     * or the function mask bit to 0 should cause all the cached addresses
> +     * and data fields to be recalculated.
> +     *
> +     * In order to avoid the overhead of disabling and enabling all the
> +     * entries every time the guest sets the maskall bit, Xen will only
> +     * perform the disable and enable sequence when the guest has written to
> +     * the entry.
> +     */
> +    if ( new_enabled && !new_masked && (!msix->enabled || msix->masked) )
> +    {
> +        for ( i = 0; i < msix->max_entries; i++ )
> +        {
> +            if ( msix->entries[i].masked || !msix->entries[i].updated )
> +                continue;
> +
> +            rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
> +            /* Ignore ENOENT, it means the entry wasn't setup. */
> +            if ( rc && rc != -ENOENT )
> +            {
> +                gprintk(XENLOG_WARNING,
> +                        "%04x:%02x:%02x.%u: unable to disable entry %u: 
> %d\n",
> +                        pdev->seg, pdev->bus, slot, func, i, rc);

In a log from e.g. a customer, how am I to tell this message from ...

> +                return;
> +            }
> +
> +            rc = vpci_msix_arch_enable_entry(&msix->entries[i], pdev,
> +                                             VMSIX_TABLE_BASE(pdev->vpci,
> +                                                              
> VPCI_MSIX_TABLE));
> +            if ( rc )
> +            {
> +                gprintk(XENLOG_WARNING,
> +                        "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
> +                        pdev->seg, pdev->bus, slot, func, i, rc);
> +                /* Entry is likely not properly configured, skip it. */
> +                continue;
> +            }
> +
> +            /*
> +             * At this point the PIRQ is still masked. Unmask it, or else the
> +             * guest won't receive interrupts. This is due to the
> +             * disable/enable sequence performed above.
> +             */
> +            vpci_msix_arch_mask_entry(&msix->entries[i], pdev, false);
> +
> +            msix->entries[i].updated = false;
> +        }
> +    }
> +    else if ( !new_enabled && msix->enabled )
> +    {
> +        /* Guest has disabled MSIX, disable all entries. */
> +        for ( i = 0; i < msix->max_entries; i++ )
> +        {
> +            /*
> +             * NB: vpci_msix_arch_disable can be called for entries that are
> +             * not setup, it will return -ENOENT in that case.
> +             */
> +            rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
> +            switch ( rc )
> +            {
> +            case 0:
> +                /*
> +                 * Mark the entry successfully disabled as updated, so that 
> on
> +                 * the next enable the entry is properly setup. This is done
> +                 * so that the following flow works correctly:
> +                 *
> +                 * mask entry -> disable MSIX -> enable MSIX -> unmask entry
> +                 *
> +                 * Without setting 'updated', the 'unmask entry' step will 
> fail
> +                 * because the entry has not been updated, so it would not be
> +                 * mapped/bound at all.
> +                 */
> +                msix->entries[i].updated = true;
> +                break;
> +            case -ENOENT:
> +                /* Ignore non-present entry. */
> +                break;
> +            default:
> +                gprintk(XENLOG_WARNING,
> +                        "%04x:%02x:%02x.%u: unable to disable entry %u: 
> %d\n",
> +                        pdev->seg, pdev->bus, slot, func, i, rc);

... this one?

> +static bool access_allowed(const struct pci_dev *pdev, unsigned long addr,
> +                           unsigned int len)
> +{
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +
> +    /* Only allow 32/64b accesses. */
> +    if ( len != 4 && len != 8 )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
> +                pdev->seg, pdev->bus, slot, func, len);
> +        return false;
> +    }
> +
> +    /* Only allow aligned accesses. */
> +    if ( (addr & (len - 1)) != 0 )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "%04x:%02x:%02x.%u: MSI-X only allows aligned accesses\n",
> +                pdev->seg, pdev->bus, slot, func);
> +        return false;
> +    }

Perhaps fold both error paths and messages (I'm not convinced we
need all of them [also considering others you add] anyway)?

> +static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
> +                      unsigned long data)
> +{
> +    const struct domain *d = v->domain;
> +    struct vpci_msix *msix = msix_find(d, addr);
> +    struct vpci_msix_entry *entry;
> +    unsigned int offset;
> +
> +    if ( !msix )
> +        return X86EMUL_RETRY;
> +
> +    if ( !access_allowed(msix->pdev, addr, len) )
> +        return X86EMUL_OKAY;
> +
> +    if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
> +    {
> +        /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
> +        if ( is_hardware_domain(d) )
> +        {
> +            switch ( len )
> +            {
> +            case 4:
> +                writel(data, addr);
> +                break;
> +            case 8:
> +                writeq(data, addr);
> +                break;
> +            default:
> +                ASSERT_UNREACHABLE();
> +                break;
> +            }
> +        }
> +
> +        return X86EMUL_OKAY;
> +    }
> +
> +    spin_lock(&msix->pdev->vpci->lock);
> +    entry = get_entry(msix, addr);
> +    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
> +
> +    /*
> +     * NB: Xen allows writes to the data/address registers with the entry
> +     * unmasked. The specification says this is undefined behavior, and Xen
> +     * implements it as storing the written value, which will be made 
> effective
> +     * in the next mask/unmask cycle. This also mimics the implementation in
> +     * QEMU.
> +     */
> +    switch ( offset )
> +    {
> +    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
> +        entry->updated = true;
> +        if ( len == 8 )
> +        {
> +            entry->addr = data;
> +            break;
> +        }
> +        entry->addr &= ~0xffffffff;
> +        entry->addr |= data;
> +        break;
> +    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:

This switch() is too large to not have blank lines between non-fall-
through case blocks.

> +        entry->updated = true;
> +        entry->addr &= 0xffffffff;
> +        entry->addr |= (uint64_t)data << 32;
> +        break;
> +    case PCI_MSIX_ENTRY_DATA_OFFSET:
> +        entry->updated = true;
> +        entry->data = data;
> +
> +        if ( len == 4 )
> +            break;
> +
> +        data >>= 32;
> +        /* fallthrough */
> +    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
> +    {
> +        bool new_masked = data & PCI_MSIX_VECTOR_BITMASK;
> +        const struct pci_dev *pdev = msix->pdev;
> +        int rc;
> +
> +        if ( entry->masked == new_masked )
> +            /* No change in the mask bit, nothing to do. */
> +            break;
> +
> +        if ( !new_masked && msix->enabled && !msix->masked && entry->updated 
> )
> +        {
> +            /*
> +             * If MSI-X is enabled, the function mask is not active, the 
> entry
> +             * is being unmasked and there have been changes to the address 
> or
> +             * data fields Xen needs to disable and enable the entry in order
> +             * to pick up the changes.
> +             */
> +            rc = vpci_msix_arch_disable_entry(entry, pdev);
> +            if ( rc && rc != -ENOENT )
> +            {
> +                gprintk(XENLOG_WARNING,
> +                        "%04x:%02x:%02x.%u: unable to disable entry %u: 
> %d\n",
> +                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                        PCI_FUNC(pdev->devfn), VMSIX_ENTRY_NR(msix, entry), 
> rc);
> +                break;
> +            }
> +
> +            rc = vpci_msix_arch_enable_entry(entry, pdev,
> +                                             VMSIX_TABLE_BASE(pdev->vpci,
> +                                                              
> VPCI_MSIX_TABLE));
> +            if ( rc )
> +            {
> +                gprintk(XENLOG_WARNING,
> +                        "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
> +                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                        PCI_FUNC(pdev->devfn), VMSIX_ENTRY_NR(msix, entry), 
> rc);
> +                break;
> +            }

This very much looks lie it is identical to the earlier disable/enable
code sequence - use a helper function?

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -105,6 +105,32 @@ struct vpci {
>          /* Arch-specific data. */
>          struct vpci_arch_msi arch;
>      } *msi;
> +
> +    /* MSI-X data. */
> +    struct vpci_msix {
> +        struct pci_dev *pdev;
> +        /* List link. */
> +        struct list_head next;
> +        /* Table information. */
> +#define VPCI_MSIX_TABLE     0
> +#define VPCI_MSIX_PBA       1
> +#define VPCI_MSIX_MEM_NUM   2
> +        uint32_t tables[VPCI_MSIX_MEM_NUM];
> +        /* Maximum number of vectors supported by the device. */
> +        uint16_t max_entries : 11;

Isn't this one bit too few (if there are exactly 2k entries)?

> @@ -127,6 +153,41 @@ int __must_check vpci_msi_arch_enable(struct vpci_msi 
> *msi,
>  void vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev);
>  void vpci_msi_arch_init(struct vpci_msi *msi);
>  void vpci_msi_arch_print(const struct vpci_msi *msi);
> +
> +/* Arch-specific vPCI MSI-X helpers. */
> +void vpci_msix_arch_mask_entry(struct vpci_msix_entry *entry,
> +                               const struct pci_dev *pdev, bool mask);
> +int __must_check vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
> +                                             const struct pci_dev *pdev,
> +                                             paddr_t table_base);
> +int __must_check vpci_msix_arch_disable_entry(struct vpci_msix_entry *entry,
> +                                              const struct pci_dev *pdev);
> +void vpci_msix_arch_init_entry(struct vpci_msix_entry *entry);
> +void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry);
> +
> +/*
> + * Helper functions to fetch MSIX related data. They are used by both the
> + * emulated MSIX code and the BAR handlers.
> + */
> +#define VMSIX_TABLE_BASE(vpci, nr)                                        \
> +    ((vpci)->header.bars[(vpci)->msix->tables[nr] & PCI_MSIX_BIRMASK].addr)
> +#define VMSIX_TABLE_ADDR(vpci, nr)                                        \
> +    (VMSIX_TABLE_BASE(vpci, nr) +                                         \
> +     ((vpci)->msix->tables[nr] & ~PCI_MSIX_BIRMASK))
> +
> +/*
> + * Note regarding the size calculation of the PBA: the spec mentions "The 
> last
> + * QWORD will not necessarily be fully populated", so it implies that the PBA
> + * size is 64-bit aligned.
> + */
> +#define VMSIX_TABLE_SIZE(vpci, nr)                                           
>   \
> +    ((nr == VPCI_MSIX_TABLE) ? (vpci)->msix->max_entries * 
> PCI_MSIX_ENTRY_SIZE \

(nr)

Anyway I wonder whether some or all of the macros couldn't, like the
earlier comment say, be (inline) functions.

> +                             : 
> ROUNDUP(DIV_ROUND_UP((vpci)->msix->max_entries, \
> +                                                    8), 8))
> +
> +#define VMSIX_ENTRY_NR(msix, entry)                                       \
> +    (unsigned int)((entry) - (msix)->entries)

Missing outer parentheses.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.