[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 28 Mar 2023 13:28:44 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RDJmvBYR1N+sA6Ptq+8sGs75RXK+IWVqkBdflUBSLUQ=; b=Fzp0aJa276pyCzmzx8tAoTPC3EzojBsPxEbS29YaHhnvozgoq5pTIHVZQziVcr3K2LLUTGizk7/aPmBpjMCdClFMEjdAVye4FoXjEpIJb1XknUlCSsdrJ8DvXyVqAS9w2oUkwghbl4m78VPegAOFHBBmRPvFnYlYpn86sDacCQNodtKf42azAH4HSBijPyHJNnfGdkL28XiQOmkVk0Ncuvz/IbEoZDf9vP8Eyq3usZOAbhbQQRAu8iZCiw7BglIb66bu9yei0mJIrfTIdxA667SjKDJot00T+ZgB3fk8jbFovCmVAqlJGkUzoK+nK7lFHIy6/usSRDsuKMqxNBRnTA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XuEeNh5vw1D6uwjsswnd0j75c1aXbEO5sLjvF9HXjzr96xoJ7VtiIWBpTo9c1yy/oOXrQmqMCvr0v7+nV1KEBdiaA+G+r/vPtt2Yf78ifl1gVvogif+0VsSwTQ+0bXR58p2vt4Tzf0rmgFjeUQKJLnD1X6fWO61Cls2P+RGpodpdt5xKcleLwQRq9ZpABI3RJcojQBRYXoxH1gooDDch9IVdVZnj4U6pbUBGX8XuzWC4dv0eXAQkXLQzx5nQ6JuPwpIQ+wFTSrgy9PmYwZa5yZPBYqeXr5kHsjFw5M0TVXnVg2NklWJPq1oJD04f6mqCDSQb60L76LSriibq3r/P0w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jason Andryuk <jandryuk@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 28 Mar 2023 11:29:08 +0000
  • Ironport-data: A9a23:Tj7lDK//Sh7tkPDoIuUYDrUDXX+TJUtcMsCJ2f8bNWPcYEJGY0x3x jAXDz/QMqncZTajLYp1Oo628BgO78XUy4dqHARv+3g8E34SpcT7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjUAOG6UKicYXoZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqojUNUTNNwRcawr40Ire7kI/1BjOkGlA5AdmPqob5AW2e0Q9V/rzG4ngdxMUfaEMdgKKb 76r5K20+Grf4yAsBruN+losWhRXKlJ6FVHmZkt+A8BOsDAbzsAB+v9T2M4nQVVWk120c+VZk 72hg3ASpTABZcUgkMxFO/VR/roX0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDklA2 sw7A283UCu73buRnKKactRVg+MKeZyD0IM34hmMzBn/JNN/GdXvZvuP4tVVmjAtmspJAPDSI dIDbiZiZwjBZBsJPUoLDJU5n6GjgXyXnz9w8QrJ4/ZopTWDilUpi9ABM/KMEjCObd9SkUuC4 HrP4kzyAw0ANczZwj2Amp6prraXwHygAdpPRdVU8NZphAGWlk1CMSEbbguhgsa+oFODcIJ2f hl8Fi0G6PJaGFaQZtz0RRixunOHlh8aRdtLEuc+5R2Ny6zb+AKQDC4PSTspQMwrsoo6SCIn0 neNnsj1Hnp/vbuNU3Wf+7yI6zSoNkA9L2YEaTUVUAgt7NzqoYV1hRXKJv5hGqOoitz+GRnr3 iuH6iM5gt07ksojx6i9u1fdjFqRSoPhSwc04kDbWzyj5wYgPIq9PdXwsh7c8OpKK5ufQh+Zp n8YlsOC7ecIS5aQiCiKR+ZLF7asjxqYDADhbZdUN8FJ31yQF7SLJ+i8PBkWyJ9VD/s5
  • Ironport-hdrordr: A9a23:zZ+Eca/vI6rTCNTzQJ1uk+AfI+orL9Y04lQ7vn2YSXRuA6qlfq GV/MjzsCWetN9/YhAdcLy7Scy9qBDnhOdICOsqTM+ftWDd0QPGQr2KhbGSuAEIcBeOktK1u5 0QFJRWOZncN3U/q+DQiTPVLz8n+rO62ZHtv8vli11Kai5LRZ1axzpYLCHeKGFKLTM2ZqYRJd 6S5s9KvTqydW5/VKmGL3MYRfXEo9HRtL+OW29lOyIa
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
> server that handle those writes.
> 
> Doing this, requires correlating write location with guest view
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This can be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, forbid writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> ---
> v2:
>  - adjust commit message
>  - pass struct domain to msixtbl_page_handler_get_hwaddr()
>  - reduce local variables used only once
>  - log a warning if write is forbidden if MSI-X and PBA lives on the same
>    page
>  - do not passthrough unaligned accesses
>  - handle accesses both before and after MSI-X table
> ---
>  xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/msi.h |   5 ++
>  xen/arch/x86/msi.c             |  38 ++++++++
>  3 files changed, 197 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 9c82bf9b4ec2..9293009a4075 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
>      .write = _msixtbl_write,
>  };
>  
> +static void __iomem *msixtbl_page_handler_get_hwaddr(
> +    const struct domain *d,
> +    uint64_t address,
> +    bool write)
> +{
> +    const struct pci_dev *pdev = NULL;
> +    const struct msixtbl_entry *entry;
> +    int adj_type;

unsigned AFAICT.

> +
> +    rcu_read_lock(&msixtbl_rcu_lock);
> +    /*
> +     * Check if it's on the same page as the end of the MSI-X table, but
> +     * outside of the table itself.
> +     */
> +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +    {
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> +             address < entry->gtable )
> +        {
> +            adj_type = ADJ_IDX_FIRST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) 
> &&

Should be entry->gtable + entry->table_len - 1, or else you are
off-by-one.

> +             address >= entry->gtable + entry->table_len )
> +        {
> +            adj_type = ADJ_IDX_LAST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +    }
> +    rcu_read_unlock(&msixtbl_rcu_lock);
> +
> +    if ( !pdev )
> +        return NULL;
> +
> +    ASSERT(pdev->msix);
> +
> +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "Page for adjacent MSI-X table access not initialized for 
> %pp\n",
> +                 pdev);
> +
> +        return NULL;
> +    }
> +
> +    /* If PBA lives on the same page too, forbid writes. */
> +    if ( write && (
> +        (adj_type == ADJ_IDX_LAST &&
> +         pdev->msix->table.last == pdev->msix->pba.first) ||
> +        (adj_type == ADJ_IDX_FIRST &&
> +         pdev->msix->table.first == pdev->msix->pba.last)) )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "MSI-X table and PBA of %pp live on the same page, "
> +                 "writing to other registers there is not implemented\n",
> +                 pdev);

Don't you actually need to check that the passed address falls into the
PBA array?  PBA can be sharing the same page as the MSI-X table, but
that doesn't imply there aren't holes also not used by either the PBA
or the MSI-X table in such page.

> +        return NULL;
> +    }
> +
> +    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
> +        (address & (PAGE_SIZE - 1));

You can use PAGE_OFFSET().

> +
> +}
> +
> +static bool cf_check msixtbl_page_accept(
> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> +{
> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> +
> +    return msixtbl_page_handler_get_hwaddr(
> +            current->domain, r->addr, r->dir == IOREQ_WRITE);

I think you want to accept it also if it's a write to the PBA, and
just drop it.  You should always pass write=false and then drop it in
msixtbl_page_write() if it falls in the PBA region (but still return
X86EMUL_OKAY).

> +}
> +
> +static int cf_check msixtbl_page_read(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t *pval)

Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
accesses here.

> +{
> +    void __iomem *hwaddr;

const

I would also initialize *pval = ~0ul for safety.

> +
> +    if ( address & (len - 1) )
> +        return X86EMUL_UNHANDLEABLE;

You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
asked to split unaligned accesses into 1byte ones, but I think for
guests it's better to just drop them unless there's a specific case
that we need to handle.

Also you should return X86EMUL_OKAY here, as the address is handled
here, but the current way to handle it is to drop the access.

Printing a debug message can be helpful in case unaligned accesses
need to be implemented in order to support some device.

> +
> +    hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, false);
> +
> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;

Maybe X86EMUL_RETRY, since the page was there in the accept handler.

> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;

Nit: we usually add a newline after every break;

> +    default:
> +        return X86EMUL_UNHANDLEABLE;

I would rather use ASSERT_UNREACHABLE() here and fallthrough to the
return below.  Seeing such size is likely an indication of something
else gone very wrong, better to just terminate the access at once.

> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int cf_check msixtbl_page_write(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t val)
> +{
> +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, true);
> +

You don't seem to check whether the access is aligned here?

Otherwise I have mostly the same comments as in msixtbl_page_read().

> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    switch ( len ) {
> +        case 1:
> +            writeb(val, hwaddr);
> +            break;
> +        case 2:
> +            writew(val, hwaddr);
> +            break;
> +        case 4:
> +            writel(val, hwaddr);
> +            break;
> +        case 8:
> +            writeq(val, hwaddr);
> +            break;
> +        default:
> +            return X86EMUL_UNHANDLEABLE;
> +    }
> +    return X86EMUL_OKAY;
> +
> +}
> +
> +static const struct hvm_io_ops msixtbl_mmio_page_ops = {
> +    .accept = msixtbl_page_accept,
> +    .read = msixtbl_page_read,
> +    .write = msixtbl_page_write,
> +};
> +
>  static void add_msixtbl_entry(struct domain *d,
>                                struct pci_dev *pdev,
>                                uint64_t gtable,
> @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
>          handler->type = IOREQ_TYPE_COPY;
>          handler->ops = &msixtbl_mmio_ops;
>      }
> +
> +    /* passthrough access to other registers on the same page */
> +    handler = hvm_next_io_handler(d);
> +    if ( handler )
> +    {
> +        handler->type = IOREQ_TYPE_COPY;
> +        handler->ops = &msixtbl_mmio_page_ops;
> +    }
>  }
>  
>  void msixtbl_pt_cleanup(struct domain *d)
> diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> index a53ade95c9ad..d13cf1c1f873 100644
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_table_idx[] below */
> +#define ADJ_IDX_FIRST 0
> +#define ADJ_IDX_LAST  1
> +
>  struct arch_msix {
>      unsigned int nr_entries, used_entries;
>      struct {
> @@ -214,6 +218,7 @@ struct arch_msix {
>      } table, pba;
>      int table_refcnt[MAX_MSIX_TABLE_PAGES];
>      int table_idx[MAX_MSIX_TABLE_PAGES];
> +    int adj_access_table_idx[2];
>      spinlock_t table_lock;
>      bool host_maskall, guest_maskall;
>      domid_t warned;
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index d0bf63df1def..680853f84685 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
>                  domain_crash(d);
>              /* XXX How to deal with existing mappings? */
>          }
> +
> +        /*
> +         * If the MSI-X table doesn't start at the page boundary, map the 
> first page for
> +         * passthrough accesses.
> +         */

I think you should initialize
msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?

> +        if ( table_paddr & (PAGE_SIZE - 1) )
> +        {
> +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: 
> %d\n", idx);
> +        }
> +        /*
> +         * If the MSI-X table doesn't span full page(s), map the last page 
> for
> +         * passthrough accesses.
> +         */
> +        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & 
> (PAGE_SIZE - 1) )
> +        {
> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
> PCI_MSIX_ENTRY_SIZE;
> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: 
> %d\n", idx);
> +        }
>      }
>      WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
>      ++msix->used_entries;
> @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> +        }
> +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
> +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;

Isn't 0 a valid idx?  You should check for >= 0 and also set
to -1 once the fixmap entry has been put.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.