[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] x86/msi: Allow writes to registers on the same page as MSI-X table


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 17 Nov 2022 17:34:36 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UKBTGsj+MVzS9O6jND3KpysaYaMH3oCjLdcjdJhFBy8=; b=eebEEIIBKvjsOlDJCqi2kBsIjNcNtQWhZPooGnYbTnu8sCGUv9ydd9SgGMNilmKLwIdkpRzzfC25IvKUACO85QSs9uL6RCrb2CfGO9NLR0Cle4/KF+iuBezCkWzfJEIR+Dnrcf1cIPhpvxFKDOmTQ07yM69879mYxmLhSNGqWbA0UYS7safvE/Dz4wNolj7TdNWdrzTOtsT2MOthCg8t0U+ynX//O2WFsoelI/eDp/7hTbdJpoA61EjBSh0txd3RcUaKQ4o5tThY438wL3+RwRIJJLySJzcR1tKSg3JsnQe48QGXvV8dohVEr7kAne/aYOLHHYY/al9gYawpkqTX4g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=A6kLP7lS2IuAHEP7WHtXt1G+awjYY/ct1xcCQIiIVAtvgmeVKkyjBm3D6toZKq8HHjaLgDPZziiOT9xZxFchjtjtra+ZpKhWWg1TFLVDoFkIyK2hIG+SZtY8nk2ZyC5NdHqxcGlA1xhr1vpK/+ZLmYZGlidKiqz+BjOfGxLwNNvZTA/ZA5ZI5/J0H5xQFdGmTygBQswEKyMr5DcWUhPuiDmx4hClbohmUSOp7tBGYAT5UdHEU7TIRHL5SuH4DZSiX+mDIyxXWQd/kagI8OmZKb9P50UMSvoYMafW1YxQS21C+1JvTI5PMG5ECK195aYTxk0UF+G9ZjNNrdLMryrOiQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 17 Nov 2022 16:34:48 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.11.2022 20:21, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on mmio_ro_ranges list). Instead, add internal ioreq
> server that handle those writes.
> 
> This may be also used to read Pending Bit Array, if it lives on the same

"may" sounds as if this would be future work, yet ...

> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, it can be
> (and will be) mapped to the guest directly.
> If PBA lives on the same page, forbid writes.

... here you say you actually handle the case (because otherwise you
wouldn't need to distinguish writes from reads). It is only ...

> Technically, writes outside
> of PBA could be allowed, but at this moment the precise location of PBA
> isn't saved.

... this part which right now you don't handle. I have to admit that I'm
not convinced we should take such a partial implementation, especially
if there's nothing recorded in the log (making it harder to tell whether
something not working is because of this implementation restriction or
some other issue).

> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -428,6 +428,133 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
>      .write = _msixtbl_write,
>  };
>  
> +static void __iomem *msixtbl_page_handler_get_hwaddr(
> +        const struct vcpu *v,
> +        uint64_t address,
> +        bool write)

These want to be indented just like ...

> +{
> +    struct domain *d = v->domain;
> +    struct pci_dev *pdev = NULL;
> +    struct msixtbl_entry *entry;
> +    void __iomem *ret = NULL;
> +    uint64_t table_end_addr;

... function scope local variables.

Also: Pointer-to-const for the first three local variables? And maybe
omit "ret", which is effectively used just once (as you could use
"return" at the point where you assign to it). Also you don't further
use v afaics, so maybe have the callers pass in const struct domain *
right away?

> +    rcu_read_lock(&msixtbl_rcu_lock);
> +    /*
> +     * Check if it's on the same page as the end of the MSI-X table, but
> +     * outside of the table itself.
> +     */
> +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) 
> &&
> +             address >= entry->gtable + entry->table_len )
> +        {
> +            pdev = entry->pdev;
> +            break;
> +        }
> +    rcu_read_unlock(&msixtbl_rcu_lock);
> +
> +    if ( !pdev )
> +        return NULL;
> +
> +    ASSERT( pdev->msix );

Style: ASSERT is not a (pseudo-)keyword and hence should not have
blanks immediately inside the parentheses. (More instances further
down.)

> +    table_end_addr = (pdev->msix->table.first << PAGE_SHIFT) +
> +        pdev->msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
> +    ASSERT( PFN_DOWN(table_end_addr) == pdev->msix->table.last );

What are you trying to catch here? I ask because the local variable
exists just for this checking afaics.

> +    /* If PBA lives on the same page too, forbid writes. */
> +    if ( write && pdev->msix->table.last == pdev->msix->pba.first )
> +        return NULL;
> +
> +    if ( pdev->msix->last_table_page )
> +        ret = pdev->msix->last_table_page + (address & (PAGE_SIZE - 1));
> +    else
> +        gdprintk(XENLOG_WARNING,
> +                 "MSI-X last_table_page not initialized for 
> %04x:%02x:%02x.%u\n",
> +                 pdev->seg,
> +                 pdev->bus,
> +                 PCI_SLOT(pdev->devfn),
> +                 PCI_FUNC(pdev->devfn));
> +

Please use %pp.

> +static bool cf_check msixtbl_page_accept(
> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> +{
> +    unsigned long addr = r->addr;

Any particular reason for having this local variable, which is used ...

> +    ASSERT( r->type == IOREQ_TYPE_COPY );
> +
> +    return msixtbl_page_handler_get_hwaddr(
> +            current, addr, r->dir == IOREQ_WRITE);
> +}

... exactly once?

> +static int cf_check msixtbl_page_read(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t *pval)
> +{
> +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current, address, false);
> +
> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    switch ( len ) {

Style: Brace on its own line please and ...

> +        case 1:
> +            *pval = readb(hwaddr);
> +            break;
> +        case 2:
> +            *pval = readw(hwaddr);
> +            break;
> +        case 4:
> +            *pval = readl(hwaddr);
> +            break;
> +        case 8:
> +            *pval = readq(hwaddr);
> +            break;
> +        default:
> +            return X86EMUL_UNHANDLEABLE;

... the body un-indented by a level.

As to operation I'm unconvinced that carrying out misaligned accesses
here is generally safe. If we find devices really needing such, we
may need to think about ways to deal with them without putting at
risk everyone else. At the very least you need to make sure you don't
access beyond the end of the page.

> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -961,6 +961,21 @@ static int msix_capability_init(struct pci_dev *dev,
>                  domain_crash(d);
>              /* XXX How to deal with existing mappings? */
>          }
> +
> +        /*
> +         * If the MSI-X table doesn't span full page(s), map the last page 
> for
> +         * passthrough accesses.
> +         */
> +        if ( (msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
> +        {
> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
> PCI_MSIX_ENTRY_SIZE;
> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->last_table_page = fix_to_virt(idx);
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: 
> %d\n", idx);
> +        }

Could we avoid the extra work if there's only less than one page's
worth of entries for a device? But then again maybe not worth any
extra code, as the same mapping will be re-used anyway due to the
refcounting that's being used.

Makes me think of another aspect though: Don't we also need to
handle stuff living on the same page as the start of the table, if
that doesn't start at a page boundary?

> @@ -1090,6 +1105,12 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->last_table_page )
> +        {
> +            msix_put_fixmap(msix,
> +                            virt_to_fix((unsigned 
> long)msix->last_table_page));
> +            msix->last_table_page = 0;

To set a pointer please use NULL.

Overall it looks like you're dealing with the issue for HVM only.
You will want to express this in the title, perhaps by using x86/hvm:
as the prefix. But then the question of course is whether this couldn't
be dealt with in/from mmio_ro_emulated_write(), which handles both HVM
and PV. Which in turn raises the question: Do you need to handle reads
in the new code in the first place?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.