[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] vpci/msix: handle accesses adjacent to the MSI-X table


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 21 Mar 2023 16:31:33 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tgOVzEkNbKiUX/cfupqbNE3o/J6r+GiPvKYpvQl18h0=; b=AWd+gOxH7RW2M1hl3O9PIo64ApnQDh/fhEGkPmTPbugr/503HmtFy0RWxwWjsI7eU3J39iXIIP2gWNexdFkJSPHXDqbbGZ6VQQfOijMEDMIK8e5/9YDqoaszhnAJkjfluukZWVMjH9y+ECx2q7HcyHwAImGR6Art104lifUOs+WEoIHYVQl+rebGrcL29yhyc6tLTpnYg9IlbnGiXhfgeZUfPMqkZGRMh0284DO4iJGuZoFevy7/ptJMF02oJZVyUSWY+2tbn+DFfCtSTtzIxiKQdeB/w+eZko4ptNc5BqZkZpR2OfFtARZkM4uLK61UvPUF+exaFJvTAWYOrK5Efg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ef3keST0LUYZ8E3odAu5uO60BL72RLUn/HT3WCshn73M1dx7NXmQ1Cw+rdzgYqdmZzrNkeQCgq2xRdE/4PmTuIgye0HOY7PJbypz4JYYOaM+qk+0YWFQNPPlbmHF+fV0oyqOiMqD8e4ipcdgA9+AQK7NG2n2aI97xPH67hd/qwkq6Y5xSLHauDRs6P4twQZaueEgd92WjH1JZosg3Bub5q2MTiHBcMFjd4pnWDOWrLWdIYABITkloeAKOPFHqNR3j3r1k2rNTEKZV3xtK7LlpNXl1CZ3lBIXzeu2h/wud/MnU2nVs+W+Z91jYxf+ulpnbAYGbEubdgTilH+gRZW8fQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 21 Mar 2023 15:32:02 +0000
  • Ironport-data: A9a23:lZNgFa6LttoEibRlKKcBAAxRtCrGchMFZxGqfqrLsTDasY5as4F+v mYWUD/QMv+NNjSnc4siOou/phkC68SBn982SAQ6rSozHi5G8cbLO4+Ufxz6V8+wwm8vb2o8t plDNYOQRCwQZiWBzvt4GuG59RGQ7YnRGvynTraCYnsrLeNdYH9JoQp5nOIkiZJfj9G8Agec0 fv/uMSaM1K+s9JOGjt8B5mr9VU+7JwehBtC5gZlPasR5QeE/5UoJMl3yZ+ZfiOQrrZ8RoZWd 86bpJml82XQ+QsaC9/Nut4XpWVTH9Y+lSDX4pZnc/DKbipq/0Te4Y5iXBYoUm9Fii3hojxE4 I4lWapc6+seFvakdOw1C3G0GszlVEFM0OevzXOX6aR/w6BaGpdFLjoH4EweZOUlFuhL7W5m0 tsSKywqVxW4v92G8OuFSPdlgeh+BZy+VG8fkikIITDxK98DGMqGb4CUoNhS0XE3m9xEGuvYa 4wBcz1zYR/cYhpJfFAKFJY5m+TujX76G9FagAvN+exrvC6OnUoojumF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWxXqjCd5PROXQGvhCjG2oxEUVET0tckrkmaWXqG2zdcx5N BlBksYphe1onKCxdfH/VRClpH+PvjYHRsFdVeY97Wml2qfSpgqUGGUAZjpAc8A98t87QyQw0 V2ElM+vAiZg2IB5UlqY/7aQ6D+3Zy4cKDZYYTdeFFVUpd7+vIs0kxTDCM55F7K4hcH0Hje2x C2WqC85hPMYistjO7iHwG0rSgmE/vDhJjPZLC2NNo55xmuVvLKYWrE=
  • Ironport-hdrordr: A9a23:nOwGV6rF/l8fxNE+RdMlYhIaV5rseYIsimQD101hICG8cqSj5q aTdZUgtSMc7Qx7ZJhOo7G90cW7MBbhHP1OkPAs1NWZLXHbUQKTRekMg7cKqweQYBEWndQtsZ uIHZIOb+HYPBxWt+u/xi+SeuxN/DCAysqVrNab9VtWCStNTI5BwTtDIju6NGozfiV6bKBJd6 a0145Jpz+tY3QFYt7TPBQ4duLevcDMkJ78QTNuPW9E1DWz
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 20, 2023 at 01:08:48PM +0100, Jan Beulich wrote:
> On 16.03.2023 13:07, Roger Pau Monne wrote:
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -27,6 +27,11 @@
> >      ((addr) >= vmsix_table_addr(vpci, nr) &&                              \
> >       (addr) < vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr))
> >  
> > +#define VMSIX_ADDR_SAME_PAGE(addr, vpci, nr)                              \
> > +    (PFN_DOWN(addr) >= PFN_DOWN(vmsix_table_addr(vpci, nr)) &&            \
> > +     PFN_DOWN((addr)) < PFN_UP(vmsix_table_addr(vpci, nr) +               \
> > +                               vmsix_table_size(vpci, nr) - 1))
> 
> Looks like this would be better in line with get_slot() (and slightly
> cheaper) if the 2nd comparison was ... <= PFN_DOWN().

Can adjust, I don't have a strong opinion.

> 
> > @@ -149,7 +154,7 @@ static struct vpci_msix *msix_find(const struct domain 
> > *d, unsigned long addr)
> >  
> >          for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> >              if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
> > -                 VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
> > +                 VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
> >                  return msix;
> >      }
> >  
> > @@ -182,93 +187,201 @@ static struct vpci_msix_entry *get_entry(struct 
> > vpci_msix *msix,
> >      return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
> >  }
> >  
> > -static void __iomem *get_pba(struct vpci *vpci)
> > +static void __iomem *get_table(struct vpci *vpci, unsigned int slot)
> >  {
> >      struct vpci_msix *msix = vpci->msix;
> >      /*
> > -     * PBA will only be unmapped when the device is deassigned, so access 
> > it
> > -     * without holding the vpci lock.
> > +     * Regions will only be unmapped when the device is deassigned, so 
> > access
> > +     * them without holding the vpci lock.
> >       */
> > -    void __iomem *pba = read_atomic(&msix->pba);
> > +    void __iomem *table = read_atomic(&msix->table[slot]);
> > +    paddr_t addr = 0;
> >  
> > -    if ( likely(pba) )
> > -        return pba;
> > +    if ( likely(table) )
> > +        return table;
> >  
> > -    pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
> > -                  vmsix_table_size(vpci, VPCI_MSIX_PBA));
> > -    if ( !pba )
> > -        return read_atomic(&msix->pba);
> > +    switch ( slot )
> > +    {
> > +    case VPCI_MSIX_TBL_TAIL:
> > +        addr = vmsix_table_size(vpci, VPCI_MSIX_TABLE);
> > +        fallthrough;
> > +    case VPCI_MSIX_TBL_HEAD:
> > +        addr += vmsix_table_addr(vpci, VPCI_MSIX_TABLE);
> > +        break;
> > +
> > +    case VPCI_MSIX_PBA_TAIL:
> > +        addr = vmsix_table_size(vpci, VPCI_MSIX_PBA);
> > +        fallthrough;
> > +    case VPCI_MSIX_PBA_HEAD:
> > +        addr += vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> > +        break;
> 
> Hmm, wasn't the plan to stop special-casing the PBA, including its
> special treatment wrt the p2m?

I had a pre-patch to do that, but TBH I'm not sure it's worth handling
dom0 vs domU different here, the more that PBA accesses aren't that
common anyway.

> Reading on I realize you need this for
> the (future) DomU case (to enforce r/o-ness, albeit having looked at
> the spec again the other day I'm not really convinced anymore we
> really need to squash writes), but we should be able to avoid the
> extra overhead for Dom0? (Granted it may make sense to leave this to
> a separate patch, if we want to keep the DomU handling despite not
> presently needing it.)

I can indeed avoid the trapping for dom0, it's just a 2 line change to
vpci_make_msix_hole() IIRC.  Since I've already done the code, let's
try to get the handling done for both dom0 and domU, and I can submit
the improvement for dom0 as a separate patch.

> > +    }
> > +
> > +    table = ioremap(round_pgdown(addr), PAGE_SIZE);
> > +    if ( !table )
> > +        return read_atomic(&msix->table[slot]);
> >  
> >      spin_lock(&vpci->lock);
> > -    if ( !msix->pba )
> > +    if ( !msix->table[slot] )
> >      {
> > -        write_atomic(&msix->pba, pba);
> > +        write_atomic(&msix->table[slot], table);
> >          spin_unlock(&vpci->lock);
> >      }
> >      else
> >      {
> >          spin_unlock(&vpci->lock);
> > -        iounmap(pba);
> > +        iounmap(table);
> >      }
> >  
> > -    return read_atomic(&msix->pba);
> > +    return read_atomic(&msix->table[slot]);
> >  }
> >  
> > -static int cf_check msix_read(
> > -    struct vcpu *v, unsigned long addr, unsigned int len, unsigned long 
> > *data)
> > +unsigned int get_slot(const struct vpci *vpci, unsigned long addr)
> >  {
> > -    const struct domain *d = v->domain;
> > -    struct vpci_msix *msix = msix_find(d, addr);
> > -    const struct vpci_msix_entry *entry;
> > -    unsigned int offset;
> > +    unsigned long pfn = PFN_DOWN(addr);
> >  
> > -    *data = ~0ul;
> > +    /*
> > +     * The logic below relies on having the tables identity mapped to the 
> > guest
> > +     * address space, or for the `addr` parameter to be translated into its
> > +     * host physical memory address equivalent.
> > +     */
> >  
> > -    if ( !msix )
> > -        return X86EMUL_RETRY;
> > +    if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE)) )
> > +        return VPCI_MSIX_TBL_HEAD;
> > +    if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE) +
> > +                         vmsix_table_size(vpci, VPCI_MSIX_TABLE) - 1) )
> > +        return VPCI_MSIX_TBL_TAIL;
> > +    if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA)) )
> > +        return VPCI_MSIX_PBA_HEAD;
> > +    if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA) +
> > +                         vmsix_table_size(vpci, VPCI_MSIX_PBA) - 1) )
> > +        return VPCI_MSIX_PBA_TAIL;
> > +
> > +    ASSERT_UNREACHABLE();
> > +    return -1;
> > +}
> >  
> > -    if ( !access_allowed(msix->pdev, addr, len) )
> > -        return X86EMUL_OKAY;
> > +static bool adjacent_handle(const struct vpci_msix *msix, unsigned long 
> > addr)
> > +{
> > +    unsigned int i;
> >  
> >      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
> > +        return true;
> > +
> > +    if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_TABLE) )
> > +        return false;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> > +        if ( VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
> > +            return true;
> > +
> > +    return false;
> > +}
> > +
> > +static int adjacent_read(const struct domain *d, const struct vpci_msix 
> > *msix,
> > +                         unsigned long addr, unsigned int len,
> > +                         unsigned long *data)
> > +{
> > +    const void __iomem *mem;
> > +    struct vpci *vpci = msix->pdev->vpci;
> > +    unsigned int slot;
> > +
> > +    *data = ~0ul;
> > +
> > +    if ( !adjacent_handle(msix, addr + len - 1) )
> > +        return X86EMUL_OKAY;
> > +
> > +    if ( addr & (len - 1) )
> 
> unlikely()?

Done.

> >      {
> > -        struct vpci *vpci = msix->pdev->vpci;
> > -        unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> > -        const void __iomem *pba = get_pba(vpci);
> > +        unsigned int i;
> > +
> > +        gprintk(XENLOG_DEBUG, "%pp: unaligned read to MSI-X related 
> > page\n",
> > +                &msix->pdev->sbdf);
> >  
> >          /*
> > -         * Access to PBA.
> > +         * Split unaligned accesses into byte sized ones. Shouldn't happen 
> > in
> > +         * the first place, but devices shouldn't have registers in the 
> > same 4K
> > +         * page as the MSIX tables either.
> >           *
> > -         * TODO: note that this relies on having the PBA identity mapped 
> > to the
> > -         * guest address space. If this changes the address will need to be
> > -         * translated.
> > +         * It's unclear whether this could cause issues if a guest expects 
> > a
> > +         * registers to be accessed atomically, it better use an aligned 
> > access
> > +         * if it has such expectations.
> >           */
> > -        if ( !pba )
> > -        {
> > -            gprintk(XENLOG_WARNING,
> > -                    "%pp: unable to map MSI-X PBA, report all pending\n",
> > -                    &msix->pdev->sbdf);
> > -            return X86EMUL_OKAY;
> > -        }
> >  
> > -        switch ( len )
> > +        for ( i = 0; i < len; i++ )
> >          {
> > -        case 4:
> > -            *data = readl(pba + idx);
> > -            break;
> > +            unsigned long partial = ~0ul;
> 
> Pointless initializer (~0ul is written first thing above, i.e. also in
> the recursive invocation). Then again that setting is also redundant
> with msix_read()'s. So I guess the initializer wants to stay but the
> setting at the top of the function can be dropped.

I'm always extra cautious with variables on the stack that contain
data to be returned to the guest.  All your points are valid and
correct, but I like to explicitly initialize them so that further
changes in the functions don't end up leaking them.  If you don't mind
that much I would rather leave it as-is.

> 
> > +            int rc = adjacent_read(d, msix, addr + i, 1, &partial);
> >  
> > -        case 8:
> > -            *data = readq(pba + idx);
> > -            break;
> > +            if ( rc != X86EMUL_OKAY )
> > +                return rc;
> >  
> > -        default:
> > -            ASSERT_UNREACHABLE();
> > -            break;
> > +            *data &= ~(0xfful << (i * 8));
> > +            *data |= ((uint8_t)partial) << (i * 8);
> 
> This is UB for i >= 4; you'd need to cast back to unsigned long, at which
> point I think the cast-free variant of masking by 0xff is to be preferred.

Indeed.

> >          }
> >  
> >          return X86EMUL_OKAY;
> >      }
> >  
> > +    slot = get_slot(vpci, addr);
> > +    if ( slot >= ARRAY_SIZE(msix->table) )
> > +        return X86EMUL_OKAY;
> > +
> > +    mem = get_table(vpci, slot);
> > +    if ( !mem )
> > +    {
> > +        gprintk(XENLOG_WARNING,
> > +                "%pp: unable to map MSI-X page, returning all bits set\n",
> > +                &msix->pdev->sbdf);
> > +        return X86EMUL_OKAY;
> > +    }
> 
> Could this be moved ahead of the unaligned handling, so there would be
> (commonly) only one such log message even for accesses we split?

Quite likely, let me take a look.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.