|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
On Wed, May 08, 2024 at 06:09:48PM +0200, Roger Pau Monné wrote:
> On Tue, May 07, 2024 at 02:44:02PM +0200, Marek Marczykowski-Górecki wrote:
> > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> > on the same page as MSI-X table. Device model (especially one in
> > stubdomain) cannot really handle those, as direct writes to that page is
> > refused (page is on the mmio_ro_ranges list). Instead, extend
> > msixtbl_mmio_ops to handle such accesses too.
> >
> > Doing this, requires correlating read/write location with guest
> > MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> > for PV would need to be done separately.
> >
> > This will be also used to read Pending Bit Array, if it lives on the same
> > page, making QEMU not needing /dev/mem access at all (especially helpful
> > with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> > map it to the guest directly.
> > If PBA lives on the same page, discard writes and log a message.
> > Technically, writes outside of PBA could be allowed, but at this moment
> > the precise location of PBA isn't saved, and also no known device abuses
> > the spec in this way (at least yet).
> >
> > To access those registers, msixtbl_mmio_ops need the relevant page
> > mapped. MSI handling already has infrastructure for that, using fixmap,
> > so try to map first/last page of the MSI-X table (if necessary) and save
> > their fixmap indexes. Note that msix_get_fixmap() does reference
> > counting and reuses existing mapping, so just call it directly, even if
> > the page was mapped before. Also, it uses a specific range of fixmap
> > indexes which doesn't include 0, so use 0 as default ("not mapped")
> > value - which simplifies code a bit.
> >
> > Based on assumption that all MSI-X page accesses are handled by Xen, do
> > not forward adjacent accesses to other hypothetical ioreq servers, even
> > if the access wasn't handled for some reason (failure to map pages etc).
> > Relevant places log a message about that already.
> >
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>
> Thanks, just a couple of minor comments, I think the only relevant one
> is that you can drop ADJACENT_DONT_HANDLE unless there's something
> I'm missing. The rest are mostly cosmetic, but if you have to respin
> and agree with them might be worth addressing.
>
> Sorry for giving this feedback so late in the process, I should have
> attempted to review earlier versions.
>
> > ---
> > Changes in v7:
> > - simplify logic based on assumption that all access to MSI-X pages are
> > handled by Xen (Roger)
> > - move calling adjacent_handle() into adjacent_{read,write}() (Roger)
> > - move range check into msixtbl_addr_to_desc() (Roger)
> > - fix off-by-one when initializing adj_access_idx[ADJ_IDX_LAST] (Roger)
> > - no longer distinguish between unhandled write due to PBA nearby and
> > other reasons
> > - add missing break after ASSERT_UNREACHABLE (Jan)
> > Changes in v6:
> > - use MSIX_CHECK_WARN macro
> > - extend assert on fixmap_idx
> > - add break in default label, after ASSERT_UNREACHABLE(), and move
> > setting default there
> > - style fixes
> > Changes in v5:
> > - style fixes
> > - include GCC version in the commit message
> > - warn only once (per domain, per device) about failed adjacent access
> > Changes in v4:
> > - drop same_page parameter of msixtbl_find_entry(), distinguish two
> > cases in relevant callers
> > - rename adj_access_table_idx to adj_access_idx
> > - code style fixes
> > - drop alignment check in adjacent_{read,write}() - all callers already
> > have it earlier
> > - delay mapping first/last MSI-X pages until preparing device for a
> > passthrough
> > v3:
> > - merge handling into msixtbl_mmio_ops
> > - extend commit message
> > v2:
> > - adjust commit message
> > - pass struct domain to msixtbl_page_handler_get_hwaddr()
> > - reduce local variables used only once
> > - log a warning if write is forbidden if MSI-X and PBA lives on the same
> > page
> > - do not passthrough unaligned accesses
> > - handle accesses both before and after MSI-X table
> > ---
> > xen/arch/x86/hvm/vmsi.c | 205 ++++++++++++++++++++++++++++++++--
> > xen/arch/x86/include/asm/msi.h | 5 +-
> > xen/arch/x86/msi.c | 42 +++++++-
> > 3 files changed, 242 insertions(+), 10 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 999917983789..f7b7b4998b5e 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d)
> > return d->arch.hvm.msixtbl_list.next;
> > }
> >
> > +/*
> > + * Lookup an msixtbl_entry on the same page as given addr. It's up to the
> > + * caller to check if address is strictly part of the table - if relevant.
> > + */
> > static struct msixtbl_entry *msixtbl_find_entry(
> > struct vcpu *v, unsigned long addr)
> > {
> > @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry(
> > struct domain *d = v->domain;
> >
> > list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> > - if ( addr >= entry->gtable &&
> > - addr < entry->gtable + entry->table_len )
> > + if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
> > + PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len -
> > 1) )
> > return entry;
> >
> > return NULL;
> > @@ -203,6 +207,10 @@ static struct msi_desc *msixtbl_addr_to_desc(
> > if ( !entry || !entry->pdev )
> > return NULL;
> >
> > + if ( addr < entry->gtable ||
> > + addr >= entry->gtable + entry->table_len )
> > + return NULL;
> > +
> > nr_entry = (addr - entry->gtable) / PCI_MSIX_ENTRY_SIZE;
> >
> > list_for_each_entry( desc, &entry->pdev->msi_list, list )
> > @@ -213,6 +221,152 @@ static struct msi_desc *msixtbl_addr_to_desc(
> > return NULL;
> > }
> >
> > +/*
> > + * Returns:
> > + * - ADJACENT_DONT_HANDLE if no handling should be done
> > + * - a fixmap idx to use for handling
> > + */
> > +#define ADJACENT_DONT_HANDLE UINT_MAX
>
> Isn't it fine to just return 0 to signal that the access is not
> handled?
>
> fixmap index 0 is reserved anyway (see FIX_RESERVED), so could be used
> for this purpose and then you don't need to introduce
> ADJACENT_DONT_HANDLE?
It was this way before in v2 and you asked me to not use 0 for this
purpose...
> > +static unsigned int adjacent_handle(
> > + const struct msixtbl_entry *entry, unsigned long addr, bool write)
>
> Now that it has been quite pruned, get_adjacent_idx() or some such
> might be more inline with the function logic.
makes sense
>
> > +{
> > + unsigned int adj_type;
> > + struct arch_msix *msix;
> > +
> > + if ( !entry || !entry->pdev )
> > + {
> > + ASSERT_UNREACHABLE();
> > + return ADJACENT_DONT_HANDLE;
> > + }
> > +
> > + if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable
> > )
> > + adj_type = ADJ_IDX_FIRST;
> > + else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len
> > - 1) &&
> > + addr >= entry->gtable + entry->table_len )
> > + adj_type = ADJ_IDX_LAST;
> > + else
> > + {
> > + /* All callers should already do equivalent range checking. */
> > + ASSERT_UNREACHABLE();
> > + return ADJACENT_DONT_HANDLE;
> > + }
> > +
> > + msix = entry->pdev->msix;
> > + ASSERT(msix);
>
> Since you already do it above, would be best to do:
>
> if ( !msix )
> {
> ASSERT_UNREACHABLE();
> return 0;
> }
Ok.
> > +
> > + if ( !msix->adj_access_idx[adj_type] )
> > + {
> > + if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> > + adjacent_not_initialized) )
> > + gprintk(XENLOG_WARNING,
> > + "Page for adjacent(%d) MSI-X table access not
> > initialized for %pp (addr %#lx, gtable %#lx\n",
>
> Do you really need to log an error here? There's an error already
> printed in msix_capability_init() if the adjacent pages can't be
> mapped.
IMO it's better to keep this message, otherwise it might be pretty hard
to debug not working device - a message buried somewhere on startup
might be hard to correlate with an issue much later.
> > + adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> > + return ADJACENT_DONT_HANDLE;
> > + }
> > +
> > + /* If PBA lives on the same page too, discard writes. */
> > + if ( write &&
> > + ((adj_type == ADJ_IDX_LAST &&
> > + msix->table.last == msix->pba.first) ||
> > + (adj_type == ADJ_IDX_FIRST &&
> > + msix->table.first == msix->pba.last)) )
> > + {
> > + if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> > + adjacent_pba) )
> > + gprintk(XENLOG_WARNING,
> > + "MSI-X table and PBA of %pp live on the same page, "
> > + "writing to other registers there is not
> > implemented\n",
> > + &entry->pdev->sbdf);
>
> I would usually start those errors with the SBDF, as that makes it
> easier to identify error originating from the same device:
>
> "%pp: MSI-X table and PBA share a page, discard write to adjacent memory
> (%#lx)\n",
> &entry->pdev->sbdf, addr
ok
> > + return ADJACENT_DONT_HANDLE;
> > + }
> > +
> > + return msix->adj_access_idx[adj_type];
> > +}
> > +
> > +static int adjacent_read(
> > + const struct msixtbl_entry *entry,
> > + paddr_t address, unsigned int len, uint64_t *pval)
> > +{
> > + const void __iomem *hwaddr;
> > + unsigned int fixmap_idx;
> > +
>
> I would add an ASSERT(IS_ALIGNED(address, len)) (and in
> adjacent_write()) just in case, as otherwise the access could cross a
> page boundary.
ok
> > + fixmap_idx = adjacent_handle(entry, address, false);
> > +
> > + if ( fixmap_idx == ADJACENT_DONT_HANDLE )
> > + {
> > + *pval = ~0UL;
> > + return X86EMUL_OKAY;
> > + }
>
> FWIW, I find it safer to unconditionally init *pval = ~0UL at the
> start of the function, and then the return here and in the default
> switch statement case can avoid setting it. It's less easy to return
> without the variable being set.
It was this way in v5, but Jan asked me to move it to only relevant
branch.
> > + hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> > +
> > + switch ( len )
> > + {
> > + case 1:
> > + *pval = readb(hwaddr);
> > + break;
> > +
> > + case 2:
> > + *pval = readw(hwaddr);
> > + break;
> > +
> > + case 4:
> > + *pval = readl(hwaddr);
> > + break;
> > +
> > + case 8:
> > + *pval = readq(hwaddr);
> > + break;
> > +
> > + default:
> > + ASSERT_UNREACHABLE();
> > + *pval = ~0UL;
> > + break;
>
> We should likely move this to some kind of helpers, as it's now
> exactly the same that's done in adjacent_{read,write}() in
> vpci/msix.c (not that you should do it here, just a note).
>
> > + }
> > +
> > + return X86EMUL_OKAY;
> > +}
> > +
> > +static int adjacent_write(
> > + const struct msixtbl_entry *entry,
> > + paddr_t address, unsigned int len, uint64_t val)
> > +{
> > + void __iomem *hwaddr;
> > + unsigned int fixmap_idx;
> > +
> > + fixmap_idx = adjacent_handle(entry, address, true);
> > +
> > + if ( fixmap_idx == ADJACENT_DONT_HANDLE )
> > + return X86EMUL_OKAY;
> > +
> > + hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> > +
> > + switch ( len )
> > + {
> > + case 1:
> > + writeb(val, hwaddr);
> > + break;
> > +
> > + case 2:
> > + writew(val, hwaddr);
> > + break;
> > +
> > + case 4:
> > + writel(val, hwaddr);
> > + break;
> > +
> > + case 8:
> > + writeq(val, hwaddr);
> > + break;
> > +
> > + default:
> > + ASSERT_UNREACHABLE();
> > + break;
> > + }
> > +
> > + return X86EMUL_OKAY;
> > +}
> > +
> > static int cf_check msixtbl_read(
> > const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
> > uint64_t *pval)
> > @@ -222,7 +376,7 @@ static int cf_check msixtbl_read(
> > unsigned int nr_entry, index;
> > int r = X86EMUL_UNHANDLEABLE;
> >
> > - if ( (len != 4 && len != 8) || (address & (len - 1)) )
> > + if ( !IS_ALIGNED(address, len) )
> > return r;
> >
> > rcu_read_lock(&msixtbl_rcu_lock);
> > @@ -230,6 +384,17 @@ static int cf_check msixtbl_read(
> > entry = msixtbl_find_entry(current, address);
> > if ( !entry )
> > goto out;
> > +
> > + if ( address < entry->gtable ||
> > + address >= entry->gtable + entry->table_len )
> > + {
> > + r = adjacent_read(entry, address, len, pval);
> > + goto out;
> > + }
> > +
> > + if ( len != 4 && len != 8 )
> > + goto out;
> > +
> > offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
> >
> > if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
> > @@ -291,6 +456,17 @@ static int msixtbl_write(struct vcpu *v, unsigned long
> > address,
> > entry = msixtbl_find_entry(v, address);
> > if ( !entry )
> > goto out;
> > +
> > + if ( address < entry->gtable ||
> > + address >= entry->gtable + entry->table_len )
> > + {
> > + r = adjacent_write(entry, address, len, val);
> > + goto out;
> > + }
> > +
> > + if ( len != 4 && len != 8 )
> > + goto out;
> > +
> > nr_entry = array_index_nospec(((address - entry->gtable) /
> > PCI_MSIX_ENTRY_SIZE),
> > MAX_MSIX_TABLE_ENTRIES);
> > @@ -356,8 +532,8 @@ static int cf_check _msixtbl_write(
> > const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
> > uint64_t val)
> > {
> > - /* Ignore invalid length or unaligned writes. */
> > - if ( (len != 4 && len != 8) || !IS_ALIGNED(address, len) )
> > + /* Ignore unaligned writes. */
> > + if ( !IS_ALIGNED(address, len) )
> > return X86EMUL_OKAY;
> >
> > /*
> > @@ -374,16 +550,25 @@ static bool cf_check msixtbl_range(
> > {
> > struct vcpu *curr = current;
> > unsigned long addr = r->addr;
> > - const struct msi_desc *desc;
> > + const struct msixtbl_entry *entry;
> > + bool ret = false;
> >
> > ASSERT(r->type == IOREQ_TYPE_COPY);
> >
> > rcu_read_lock(&msixtbl_rcu_lock);
> > - desc = msixtbl_addr_to_desc(msixtbl_find_entry(curr, addr), addr);
> > + entry = msixtbl_find_entry(curr, addr);
> > + if ( entry )
> > + {
> > + if ( addr < entry->gtable || addr >= entry->gtable +
> > entry->table_len )
> > + /* Possibly handle adjacent access. */
> > + ret = true;
> > + else
> > + ret = msixtbl_addr_to_desc(entry, addr) != NULL;
> > + }
>
> You could probably put all this into a single condition:
>
> if ( entry &&
> /* Adjacent access. */
> (addr < entry->gtable || addr >= entry->gtable + entry->table_len ||
> /* Otherwise check if there's a matching msi_desc. */
> msixtbl_addr_to_desc(entry, addr)) )
> ret = true;
>
> That's IMO easier to read by setting ret once only.
Is multi-line "if" mixed with comments really easier to follow?
>
> > rcu_read_unlock(&msixtbl_rcu_lock);
> >
> > - if ( desc )
> > - return 1;
> > + if ( ret )
> > + return ret;
> >
> > if ( r->state == STATE_IOREQ_READY && r->dir == IOREQ_WRITE )
> > {
> > @@ -429,7 +614,7 @@ static bool cf_check msixtbl_range(
> > }
> > }
> >
> > - return 0;
> > + return false;
> > }
> >
> > static const struct hvm_io_ops msixtbl_mmio_ops = {
> > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> > index bcfdfd35345d..923b730d48b8 100644
> > --- a/xen/arch/x86/include/asm/msi.h
> > +++ b/xen/arch/x86/include/asm/msi.h
> > @@ -224,6 +224,9 @@ struct arch_msix {
> > } table, pba;
> > int table_refcnt[MAX_MSIX_TABLE_PAGES];
> > int table_idx[MAX_MSIX_TABLE_PAGES];
> > +#define ADJ_IDX_FIRST 0
> > +#define ADJ_IDX_LAST 1
> > + unsigned int adj_access_idx[2];
> > spinlock_t table_lock;
> > bool host_maskall, guest_maskall;
> > domid_t warned_domid;
> > @@ -231,6 +234,8 @@ struct arch_msix {
> > uint8_t all;
> > struct {
> > bool maskall : 1;
> > + bool adjacent_not_initialized : 1;
>
> Not sure we need that, since we already warn of failure to map at
> initialization time, not sure it's worth also printing another error
> at access time.
See response earlier above.
> > + bool adjacent_pba : 1;
> > };
> > } warned_kind;
> > };
> > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> > index 42c793426da3..87190a88ed5d 100644
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -913,6 +913,37 @@ static int msix_capability_init(struct pci_dev *dev,
> > list_add_tail(&entry->list, &dev->msi_list);
> > *desc = entry;
> > }
> > + else
> > + {
> > + /*
> > + * If the MSI-X table doesn't start at the page boundary, map the
> > first page for
> > + * passthrough accesses.
> > + */
> > + if ( PAGE_OFFSET(table_paddr) )
> > + {
> > + int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> > +
> > + if ( idx > 0 )
> > + msix->adj_access_idx[ADJ_IDX_FIRST] = idx;
> > + else
> > + gprintk(XENLOG_ERR, "Failed to map first MSI-X table page:
> > %d\n", idx);
> > + }
> > + /*
> > + * If the MSI-X table doesn't end on the page boundary, map the
> > last page
> > + * for passthrough accesses.
> > + */
> > + if ( PAGE_OFFSET(table_paddr + msix->nr_entries *
> > PCI_MSIX_ENTRY_SIZE) )
> > + {
> > + uint64_t entry_paddr = table_paddr +
> > + (msix->nr_entries - 1) * PCI_MSIX_ENTRY_SIZE;
> > + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> > +
> > + if ( idx > 0 )
> > + msix->adj_access_idx[ADJ_IDX_LAST] = idx;
> > + else
> > + gprintk(XENLOG_ERR, "Failed to map last MSI-X table page:
> > %d\n", idx);
>
> Could you prefix the messages with the SBDF of the device please?
> It's in the seg, bus, slot, func local variables AFAICT.
Ok (%pp + dev->sbdf).
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |