[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci/msix: handle accesses adjacent to the MSI-X table


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 14 Mar 2023 12:56:33 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CycMpsj2bFwh2XWHtApbCxJag+E25ba/wqRdhNUKzPs=; b=eyT6c8+x9PRQMwa2kCyDaaYBEdjIRVJljyu4w59fBx38yWAFu80tLG5qOfZ068DzUM7im0u7QbO34LcWapRkqyaNzSsZkvu+yIOoxHCK1DlJyOg5DkaNaPyFGC5JFl5OJJiUNwVkSaowjhGbEaumpiXhn+FT244GOO3vH1WI0dQ69bHAbMKDPTWQKLAh2dXeq+wNXcQtS/RXlD2i7kreozL1XjyHRVC2nCTtmiVDIlIfVg/B/RIxOAnGrPUqZLRJSF0FWhZLWJ/57lyNzs5WjwtUtcIocgYQ+XQvVuaC3DYbeoR1QdsigVdwfJCWAC9QcvfvgaTkoLYQpEUqKWeLaQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JmYy4nmZp0JBcl/uTtbQzy04GRPucTEOfkuDBtVYGXGSOy6vGqTb+gnMUkuAMEoA6XMRnUGAjDNdEUxxjcdkZLO1DCUEZf9ESZcllr3Q1OE65SeC95aNnEbDzQmiK9hgrGb/TVipFhR9DNblK+/w5vlgBeJzBCyc0CainmDr6TV+rscseTR5Qrcb2ADv4dh4a3od2FBv2qqoWQOB3hLzE2eRSbXJjgIiwmx5EgnD4KzBmFhQy/edF6AdjL1gd45QwPJXs9gJX5narh9ZJUGw88EsugU2WAKjsGefDtzLjaUeGBw5oWxA0jsbqfzCXcnunoWEUK3ev4P4Ekd2Pt5P4g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 14 Mar 2023 11:56:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.03.2023 11:13, Roger Pau Monne wrote:
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -27,6 +27,13 @@
>      ((addr) >= vmsix_table_addr(vpci, nr) &&                              \
>       (addr) < vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr))
>  
> +#define VMSIX_ADDR_ADJACENT(addr, vpci, nr)                               \
> +    ((PFN_DOWN(addr) == PFN_DOWN(vmsix_table_addr(vpci, nr)) &&           \
> +      (addr) < vmsix_table_addr(vpci, nr)) ||                             \
> +     (PFN_DOWN(addr) == PFN_DOWN(vmsix_table_addr(vpci, nr) +             \
> +                                 vmsix_table_size(vpci, nr) - 1) &&       \
> +      (addr) >= vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr)))

While I realize this may impact performance a little, did you consider
using !VMSIX_ADDR_IN_RANGE() instead of open-coding it kind of? Then
again there's only a single use of the macro, and that's in code where
VMSIX_ADDR_IN_RANGE() was already checked (by the earlier invocation
of msix_find()), so the re-checking of the MSI-X table bounds isn't
strictly necessary anyway.

> @@ -438,6 +369,145 @@ static const struct hvm_mmio_ops vpci_msix_table_ops = {
>      .write = msix_write,
>  };
>  
> +const static struct vpci_msix *adjacent_find(const struct domain *d,
> +                                             unsigned long addr)
> +{
> +    const struct vpci_msix *msix;
> +
> +    list_for_each_entry ( msix, &d->arch.hvm.msix_tables, next )
> +        /*
> +         * So far vPCI only traps accesses to the MSIX table, but not the PBA
> +         * explicitly, and hence we only need to check for the hole created 
> by
> +         * the MSIX table.
> +         *
> +         * If the PBA table is also trapped, the check here should be 
> expanded
> +         * to take it into account.
> +         */
> +        if ( VMSIX_ADDR_ADJACENT(addr, msix->pdev->vpci, VPCI_MSIX_TABLE) )
> +            return msix;

Is the comment really correct when considering that you don't change
vpci_make_msix_hole()? (I was actually puzzled by struct vpci_msix'es
table[] field remaining a 2-element array, despite the PBA now being
dealt with differently. But I realize you need to keep that for the
VMSIX_ADDR_IN_RANGE() in adjacent_write().)

> +static int cf_check adjacent_read(
> +    struct vcpu *v, unsigned long addr, unsigned int len, unsigned long 
> *data)
> +{
> +    const struct domain *d = v->domain;
> +    const struct vpci_msix *msix = adjacent_find(d, addr);
> +    const void __iomem *mem;
> +    paddr_t msix_tbl;
> +    struct vpci *vpci;
> +
> +    *data = ~0ul;
> +
> +    if ( !msix )
> +        return X86EMUL_RETRY;
> +
> +    vpci = msix->pdev->vpci;
> +    msix_tbl = vmsix_table_addr(vpci, VPCI_MSIX_TABLE);
> +
> +    if ( addr + len > round_pgup(msix_tbl +
> +                                 vmsix_table_size(vpci, VPCI_MSIX_TABLE)) )
> +        return X86EMUL_OKAY;
> +
> +    mem = get_table(vpci,
> +                    PFN_DOWN(addr) == PFN_DOWN(msix_tbl) ? VPCI_MSIX_TBL_HEAD
> +                                                         : 
> VPCI_MSIX_TBL_TAIL);
> +    if ( !mem )
> +        return X86EMUL_OKAY;

The respective PBA logic had a gprintk() on this path.

> +    switch ( len )
> +    {
> +    case 1:
> +        *data = readb(mem + PAGE_OFFSET(addr));
> +        break;
> +
> +    case 2:
> +        *data = readw(mem + PAGE_OFFSET(addr));
> +        break;
> +
> +    case 4:
> +        *data = readl(mem + PAGE_OFFSET(addr));
> +        break;
> +
> +    case 8:
> +        *data = readq(mem + PAGE_OFFSET(addr));
> +        break;

So far we have allowed only aligned 4- and 8-byte accesses to the PBA.
Shouldn't we continue to do so?

I'm also concerned of misaligned accesses: While we can't keep the
guest from doing such on pages we don't intercept, depending on the kind
of anomalies such may cause the effects there may be contained to that
guest. When doing the accesses from the hypervisor, bad effects could
affect the entire system. (FTAOD I don't mean to constrain guests, but I
do think we need to consider splitting misaligned accesses.)

> +    default:
> +        ASSERT_UNREACHABLE();

Is this correct? In msix_{read,write}() these assertions are valid
because of the earlier access_allowed() checks, but here you have
nothing like that. Yes, the emulator currently would only pass sizes
that fit what is being handled, but relying on no "unusual" insns
appearing down the road feels risky. Then again
hvmemul_phys_mmio_access() splits accesses accordingly, so perhaps
all is fine here.

> +static int cf_check adjacent_write(
> +    struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
> +{
> +    const struct domain *d = v->domain;
> +    const struct vpci_msix *msix = adjacent_find(d, addr);
> +    void __iomem *mem;
> +    paddr_t msix_tbl;
> +    struct vpci *vpci;
> +
> +    if ( !msix )
> +        return X86EMUL_RETRY;
> +
> +    vpci = msix->pdev->vpci;
> +    msix_tbl = vmsix_table_addr(vpci, VPCI_MSIX_TABLE);
> +
> +    if ( addr + len > round_pgup(msix_tbl +
> +                                 vmsix_table_size(vpci, VPCI_MSIX_TABLE)) )
> +        return X86EMUL_OKAY;
> +
> +    if ( (VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) ||
> +          VMSIX_ADDR_IN_RANGE(addr + len - 1, vpci, VPCI_MSIX_PBA)) &&
> +         !is_hardware_domain(d) )
> +        /* Ignore writes to PBA for DomUs, it's undefined behavior. */
> +        return X86EMUL_OKAY;

Just as a remark: Checking only start and end is sufficient merely because
the PBA is a multiple of 8 bytes in size, and "len" currently cannot be
larger than 8. This feels somewhat fragile, but is - like above - presumably
okay.

> @@ -530,7 +600,10 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      }
>  
>      if ( list_empty(&d->arch.hvm.msix_tables) )
> +    {
>          register_mmio_handler(d, &vpci_msix_table_ops);
> +        register_mmio_handler(d, &vpci_msix_adj_ops);
> +    }

Did you consider re-using the same ops by widening what their accept()
covers, and by having read/write recognize inside vs outside accesses,
dealing with them differently (much like the PBA was dealt with before)?
Besides my gut feeling of this ending up being less code, there's also
the aspect of NR_IO_HANDLERS being the upper bound to how many handlers
may be registered.

Or else did you consider registering this further handler only when
there actually is a device where the MSI-X table has "slack" at the
front and/or end?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.