[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table



On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> > on the same page as MSI-X table. Device model (especially one in
> > stubdomain) cannot really handle those, as direct writes to that page is
> > refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
> > server that handle those writes.
> > 
> > Doing this, requires correlating write location with guest view
> > of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> > for PV would need to be done separately.
> > 
> > This can be also used to read Pending Bit Array, if it lives on the same
> > page, making QEMU not needing /dev/mem access at all (especially helpful
> > with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> > map it to the guest directly.
> > If PBA lives on the same page, forbid writes and log a message.
> > Technically, writes outside of PBA could be allowed, but at this moment
> > the precise location of PBA isn't saved.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > ---
> > v2:
> >  - adjust commit message
> >  - pass struct domain to msixtbl_page_handler_get_hwaddr()
> >  - reduce local variables used only once
> >  - log a warning if write is forbidden if MSI-X and PBA lives on the same
> >    page
> >  - do not passthrough unaligned accesses
> >  - handle accesses both before and after MSI-X table
> > ---
> >  xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
> >  xen/arch/x86/include/asm/msi.h |   5 ++
> >  xen/arch/x86/msi.c             |  38 ++++++++
> >  3 files changed, 197 insertions(+)
> > 
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 9c82bf9b4ec2..9293009a4075 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
> >      .write = _msixtbl_write,
> >  };
> >  
> > +static void __iomem *msixtbl_page_handler_get_hwaddr(
> > +    const struct domain *d,
> > +    uint64_t address,
> > +    bool write)
> > +{
> > +    const struct pci_dev *pdev = NULL;
> > +    const struct msixtbl_entry *entry;
> > +    int adj_type;
> 
> unsigned AFAICT.

Ok.

> > +
> > +    rcu_read_lock(&msixtbl_rcu_lock);
> > +    /*
> > +     * Check if it's on the same page as the end of the MSI-X table, but
> > +     * outside of the table itself.
> > +     */
> > +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> > +    {
> > +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> > +             address < entry->gtable )
> > +        {
> > +            adj_type = ADJ_IDX_FIRST;
> > +            pdev = entry->pdev;
> > +            break;
> > +        }
> > +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + 
> > entry->table_len) &&
> 
> Should be entry->gtable + entry->table_len - 1, or else you are
> off-by-one.

Ok.

> > +             address >= entry->gtable + entry->table_len )
> > +        {
> > +            adj_type = ADJ_IDX_LAST;
> > +            pdev = entry->pdev;
> > +            break;
> > +        }
> > +    }
> > +    rcu_read_unlock(&msixtbl_rcu_lock);
> > +
> > +    if ( !pdev )
> > +        return NULL;
> > +
> > +    ASSERT(pdev->msix);
> > +
> > +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> > +    {
> > +        gdprintk(XENLOG_WARNING,
> > +                 "Page for adjacent MSI-X table access not initialized for 
> > %pp\n",
> > +                 pdev);
> > +
> > +        return NULL;
> > +    }
> > +
> > +    /* If PBA lives on the same page too, forbid writes. */
> > +    if ( write && (
> > +        (adj_type == ADJ_IDX_LAST &&
> > +         pdev->msix->table.last == pdev->msix->pba.first) ||
> > +        (adj_type == ADJ_IDX_FIRST &&
> > +         pdev->msix->table.first == pdev->msix->pba.last)) )
> > +    {
> > +        gdprintk(XENLOG_WARNING,
> > +                 "MSI-X table and PBA of %pp live on the same page, "
> > +                 "writing to other registers there is not implemented\n",
> > +                 pdev);
> 
> Don't you actually need to check that the passed address falls into the
> PBA array?  PBA can be sharing the same page as the MSI-X table, but
> that doesn't imply there aren't holes also not used by either the PBA
> or the MSI-X table in such page.

I don't know exact location of PBA, so I'm rejecting writes just in case
they do hit PBA (see commit message).

> > +        return NULL;
> > +    }
> > +
> > +    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
> > +        (address & (PAGE_SIZE - 1));
> 
> You can use PAGE_OFFSET().

Ok.

> > +
> > +}
> > +
> > +static bool cf_check msixtbl_page_accept(
> > +        const struct hvm_io_handler *handler, const ioreq_t *r)
> > +{
> > +    ASSERT(r->type == IOREQ_TYPE_COPY);
> > +
> > +    return msixtbl_page_handler_get_hwaddr(
> > +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> 
> I think you want to accept it also if it's a write to the PBA, and
> just drop it.  You should always pass write=false and then drop it in
> msixtbl_page_write() if it falls in the PBA region (but still return
> X86EMUL_OKAY).

I don't want to interfere with msixtbl_mmio_page_ops, this handler is
only about accesses not hitting actual MSI-X structures.

> > +}
> > +
> > +static int cf_check msixtbl_page_read(
> > +        const struct hvm_io_handler *handler,
> > +        uint64_t address, uint32_t len, uint64_t *pval)
> 
> Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
> accesses here.

I followed how msixtbl_mmio_ops are registered. Should that also be
changed to hvm_mmio_ops?

> 
> > +{
> > +    void __iomem *hwaddr;
> 
> const
> 
> I would also initialize *pval = ~0ul for safety.

When returning X86EMUL_OKAY, then I agree.

> > +
> > +    if ( address & (len - 1) )
> > +        return X86EMUL_UNHANDLEABLE;
> 
> You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
> asked to split unaligned accesses into 1byte ones, but I think for
> guests it's better to just drop them unless there's a specific case
> that we need to handle.
> 
> Also you should return X86EMUL_OKAY here, as the address is handled
> here, but the current way to handle it is to drop the access.
> 
> Printing a debug message can be helpful in case unaligned accesses
> need to be implemented in order to support some device.
> 
> > +
> > +    hwaddr = msixtbl_page_handler_get_hwaddr(
> > +            current->domain, address, false);
> > +
> > +    if ( !hwaddr )
> > +        return X86EMUL_UNHANDLEABLE;
> 
> Maybe X86EMUL_RETRY, since the page was there in the accept handler.

How does X86EMUL_RETRY work? Is it trying to find the handler again?

> > +
> > +    switch ( len )
> > +    {
> > +    case 1:
> > +        *pval = readb(hwaddr);
> > +        break;
> > +    case 2:
> > +        *pval = readw(hwaddr);
> > +        break;
> > +    case 4:
> > +        *pval = readl(hwaddr);
> > +        break;
> > +    case 8:
> > +        *pval = readq(hwaddr);
> > +        break;
> 
> Nit: we usually add a newline after every break;

Ok.

> > +    default:
> > +        return X86EMUL_UNHANDLEABLE;
> 
> I would rather use ASSERT_UNREACHABLE() here and fallthrough to the
> return below.  Seeing such size is likely an indication of something
> else gone very wrong, better to just terminate the access at once.
> 
> > +    }
> > +    return X86EMUL_OKAY;
> > +}
> > +
> > +static int cf_check msixtbl_page_write(
> > +        const struct hvm_io_handler *handler,
> > +        uint64_t address, uint32_t len, uint64_t val)
> > +{
> > +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> > +            current->domain, address, true);
> > +
> 
> You don't seem to check whether the access is aligned here?
> 
> Otherwise I have mostly the same comments as in msixtbl_page_read().

Ok.

> > +    if ( !hwaddr )
> > +        return X86EMUL_UNHANDLEABLE;
> > +
> > +    switch ( len ) {
> > +        case 1:
> > +            writeb(val, hwaddr);
> > +            break;
> > +        case 2:
> > +            writew(val, hwaddr);
> > +            break;
> > +        case 4:
> > +            writel(val, hwaddr);
> > +            break;
> > +        case 8:
> > +            writeq(val, hwaddr);
> > +            break;
> > +        default:
> > +            return X86EMUL_UNHANDLEABLE;
> > +    }
> > +    return X86EMUL_OKAY;
> > +
> > +}
> > +
> > +static const struct hvm_io_ops msixtbl_mmio_page_ops = {
> > +    .accept = msixtbl_page_accept,
> > +    .read = msixtbl_page_read,
> > +    .write = msixtbl_page_write,
> > +};
> > +
> >  static void add_msixtbl_entry(struct domain *d,
> >                                struct pci_dev *pdev,
> >                                uint64_t gtable,
> > @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
> >          handler->type = IOREQ_TYPE_COPY;
> >          handler->ops = &msixtbl_mmio_ops;
> >      }
> > +
> > +    /* passthrough access to other registers on the same page */
> > +    handler = hvm_next_io_handler(d);
> > +    if ( handler )
> > +    {
> > +        handler->type = IOREQ_TYPE_COPY;
> > +        handler->ops = &msixtbl_mmio_page_ops;
> > +    }
> >  }
> >  
> >  void msixtbl_pt_cleanup(struct domain *d)
> > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> > index a53ade95c9ad..d13cf1c1f873 100644
> > --- a/xen/arch/x86/include/asm/msi.h
> > +++ b/xen/arch/x86/include/asm/msi.h
> > @@ -207,6 +207,10 @@ struct msg_address {
> >                                         PCI_MSIX_ENTRY_SIZE + \
> >                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 
> > 1)))
> >  
> > +/* indexes in adj_access_table_idx[] below */
> > +#define ADJ_IDX_FIRST 0
> > +#define ADJ_IDX_LAST  1
> > +
> >  struct arch_msix {
> >      unsigned int nr_entries, used_entries;
> >      struct {
> > @@ -214,6 +218,7 @@ struct arch_msix {
> >      } table, pba;
> >      int table_refcnt[MAX_MSIX_TABLE_PAGES];
> >      int table_idx[MAX_MSIX_TABLE_PAGES];
> > +    int adj_access_table_idx[2];
> >      spinlock_t table_lock;
> >      bool host_maskall, guest_maskall;
> >      domid_t warned;
> > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> > index d0bf63df1def..680853f84685 100644
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
> >                  domain_crash(d);
> >              /* XXX How to deal with existing mappings? */
> >          }
> > +
> > +        /*
> > +         * If the MSI-X table doesn't start at the page boundary, map the 
> > first page for
> > +         * passthrough accesses.
> > +         */
> 
> I think you should initialize
> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?
> 
> > +        if ( table_paddr & (PAGE_SIZE - 1) )
> > +        {
> > +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> > +
> > +            if ( idx >= 0 )
> > +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> > +            else
> > +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: 
> > %d\n", idx);
> > +        }
> > +        /*
> > +         * If the MSI-X table doesn't span full page(s), map the last page 
> > for
> > +         * passthrough accesses.
> > +         */
> > +        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & 
> > (PAGE_SIZE - 1) )
> > +        {
> > +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
> > PCI_MSIX_ENTRY_SIZE;
> > +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> > +
> > +            if ( idx >= 0 )
> > +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> > +            else
> > +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: 
> > %d\n", idx);
> > +        }
> >      }
> >      WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
> >      ++msix->used_entries;
> > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
> >              WARN();
> >          msix->table.first = 0;
> >          msix->table.last = 0;
> > +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> > +        {
> > +            msix_put_fixmap(msix, 
> > msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> > +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> > +        }
> > +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> > +        {
> > +            msix_put_fixmap(msix, 
> > msix->adj_access_table_idx[ADJ_IDX_LAST]);
> > +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;
> 
> Isn't 0 a valid idx?  You should check for >= 0 and also set
> to -1 once the fixmap entry has been put.

I rely here on msix using specific range of fixmap indexes
(FIX_MSIX_TO_RESERV_BASE - FIX_MSIX_TO_RESERV_END), which isn't starting
at 0. For this reason also, I keep unused entries at 0 (no need explicit
initialization).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.