|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs
On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote:
> >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
> > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
> > +{
> > + struct vpci_header *header = &pdev->vpci->header;
> > + unsigned int i;
> > + int rc = 0;
> > +
> > + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> > + {
> > + paddr_t gaddr = map ? header->bars[i].gaddr
> > + : header->bars[i].mapped_addr;
> > + paddr_t paddr = header->bars[i].paddr;
> > +
> > + if ( header->bars[i].type != VPCI_BAR_MEM &&
> > + header->bars[i].type != VPCI_BAR_MEM64_LO )
> > + continue;
> > +
> > + rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)),
> > + _mfn(PFN_DOWN(paddr)),
> > PFN_UP(header->bars[i].size),
>
> The PFN_UP() indicates a problem: For sub-page BARs you can't
> blindly map/unmap them without taking into consideration other
> devices sharing the same page.
I'm not sure I follow, the start address of BARs is always aligned to
a 4KB boundary, so there's no chance of the same page being used by
two different BARs at the same time.
The size is indeed not aligned to 4KB, but I don't see how this can
cause collisions with other BARs unless the domain is actively trying
to make the BARs overlap, in which case there's not much Xen can do.
> > + map);
> > + if ( rc )
> > + break;
> > +
> > + header->bars[i].mapped_addr = map ? gaddr : 0;
> > + }
> > +
> > + return rc;
> > +}
>
> Shouldn't this function somewhere honor the unset flags?
Right, I've added a check to make sure the BAR is positioned before
trying to map it into the domain p2m.
> > +static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val *val, void *data)
> > +{
> > + struct vpci_header *header = data;
> > +
> > + val->word = header->command;
>
> Rather than reading back and storing the value in the write handler,
> I'd recommending doing an actual read here.
OK.
> > +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val val, void *data)
> > +{
> > + struct vpci_header *header = data;
> > + uint16_t new_cmd, saved_cmd;
> > + uint8_t seg = pdev->seg, bus = pdev->bus;
> > + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > + int rc;
> > +
> > + new_cmd = val.word;
> > + saved_cmd = header->command;
> > +
> > + if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) )
> > + goto out;
> > +
> > + /* Memory space access change. */
> > + rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY);
> > + if ( rc )
> > + {
> > + dprintk(XENLOG_ERR,
> > + "%04x:%02x:%02x.%u:unable to %smap BARs: %d\n",
> > + seg, bus, slot, func,
> > + new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc);
> > + return rc;
>
> I guess you can guess the question already: What is the bare
> hardware equivalent of this failure return?
Yes, this is already fixed since write handlers simply return void.
The hw equivalent would be to ignore the write AFAICT (ie: memory
decoding will not be enabled).
Are you fine with the dprintk or would you also like me to remove
that? (IMHO it's helpful for debugging).
> > + }
> > +
> > + out:
>
> Please try to avoid goto-s and labels for other than error handling
> (and even then only when code would otherwise end up pretty
> convoluted).
Done.
> > +static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val *val, void *data)
> > +{
> > + struct vpci_bar *bar = data;
>
> const
>
> > + bool hi = false;
> > +
> > + ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> > + bar->type == VPCI_BAR_MEM64_HI);
> > +
> > + if ( bar->type == VPCI_BAR_MEM64_HI )
> > + {
> > + ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
>
> reg > PCI_BASE_ADDRESS_0
Fixed.
> > + bar--;
> > + hi = true;
> > + }
> > +
> > + if ( bar->sizing )
> > + val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0);
>
> There's also a comment further down - this is producing undefined
> behavior on 32-bits arches.
I've changed size to be a uint64_t.
> > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
> > + union vpci_val val, void *data)
> > +{
> > + struct vpci_bar *bar = data;
> > + uint32_t wdata = val.double_word;
> > + bool hi = false, unset = false;
> > +
> > + ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> > + bar->type == VPCI_BAR_MEM64_HI);
> > +
> > + if ( wdata == GENMASK(31, 0) )
>
> I'm afraid this again doesn't match real hardware behavior: As the
> low bits are r/o, writes with them having any value, but all other
> bits being 1 should have the same effect. I notice that while I had
> fixed this for the ROM BAR in Linux'es pciback, I should have also
> fixed this for ordinary ones.
I've changed this to:
switch ( bar->type )
{
case VPCI_BAR_MEM:
size_mask = GENMASK(31, 12);
break;
case VPCI_BAR_MEM64_LO:
size_mask = GENMASK(31, 26);
break;
case VPCI_BAR_MEM64_HI:
size_mask = GENMASK(31, 0);
break;
default:
ASSERT_UNREACHABLE();
break;
}
if ( (wdata & size_mask) == size_mask )
{
...
And removed the ASSERT just above (since it's now handled in the
switch itself).
> > + {
> > + /* Next reads from this register are going to return the BAR size.
> > */
> > + bar->sizing = true;
> > + return 0;
> > + }
> > +
> > + /* End previous sizing cycle if any. */
> > + bar->sizing = false;
> > +
> > + unset = bar->unset;
> > + if ( unset )
> > + bar->unset = false;
> > +
> > + if ( bar->type == VPCI_BAR_MEM64_HI )
> > + {
> > + ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
> > + bar--;
> > + hi = true;
> > + }
> > +
> > + /* Update the relevant part of the BAR address. */
> > + bar->gaddr &= hi ? ~GENMASK(63, 32) : ~GENMASK(31, 0);
> > + wdata &= hi ? GENMASK(31, 0) : PCI_BASE_ADDRESS_MEM_MASK;
>
> Perhaps easier to grok as
>
> if ( hi )
> wdata &= PCI_BASE_ADDRESS_MEM_MASK;
I've done that (with the condition reversed).
> However, considering the dual use below, I'd prefer if you wrote
> back the value you read to the low 4 bits. They're _supposed_ to
> be r/o, yes, but anyway.
Done.
>
> > + bar->gaddr |= (uint64_t)wdata << (hi ? 32 : 0);
> > +
> > + if ( unset )
> > + {
> > + bar->paddr = bar->gaddr;
>
> So this deals with first time setting of the BAR by Dom0. If Dom0
> later decides to move BARs around, how do you guarantee things
> to continue to work fine if you allow paddr and gaddr to go out of
> sync? Often the reason to do re-assignments is because the OS
> recognized address conflicts. Or it needs to make room for SR-IOV
> BARs.
I've removed the unset check, so that every BAR position change done
by Dom0 is also applied to the hardware, instead of just changing
Dom0's p2m.
> > + pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > + PCI_FUNC(pdev->devfn), reg, wdata);
>
> pci_conf_write32()
Ups, thanks.
>
> > + }
> > +
> > + ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE));
>
> Urgh.
Removed.
> > +static int vpci_init_bars(struct pci_dev *pdev)
> > +{
> > + uint8_t seg = pdev->seg, bus = pdev->bus;
> > + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > + uint8_t header_type;
> > + unsigned int i, num_bars;
> > + struct vpci_header *header = &pdev->vpci->header;
> > + struct vpci_bar *bars = header->bars;
> > + int rc;
> > +
> > + header_type = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE) &
> > 0x7f;
> > + if ( header_type == PCI_HEADER_TYPE_NORMAL )
> > + num_bars = 6;
> > + else if ( header_type == PCI_HEADER_TYPE_BRIDGE )
> > + num_bars = 2;
> > + else
> > + return -ENOSYS;
>
> -EOPNOTSUPP
>
> > + /* Setup a handler for the control register. */
> > + header->command = pci_conf_read16(seg, bus, slot, func, PCI_COMMAND);
>
> As the code says, the register is the Command Register, so your
> comment shouldn't say "control".
My mistake.
> > + rc = xen_vpci_add_register(pdev, vpci_cmd_read, vpci_cmd_write,
> > + PCI_COMMAND, 2, header);
> > + if ( rc )
> > + {
> > + dprintk(XENLOG_ERR,
> > + "%04x:%02x:%02x.%u: failed to add handler register %#x:
> > %d\n",
> > + seg, bus, slot, func, PCI_COMMAND, rc);
> > + return rc;
> > + }
> > +
> > + for ( i = 0; i < num_bars; i++ )
> > + {
> > + uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> > + uint32_t val = pci_conf_read32(seg, bus, slot, func, reg);
> > + uint64_t addr, size;
> > + unsigned int index;
> > +
> > + if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> > + {
> > + bars[i].type = VPCI_BAR_MEM64_HI;
> > + bars[i].unset = bars[i - 1].unset;
> > + continue;
>
> Neither here nor below you install a handler for this upper half.
Ugh, good catch.
> > + }
> > + else if ( (val & PCI_BASE_ADDRESS_SPACE) ==
> > PCI_BASE_ADDRESS_SPACE_IO )
> > + {
> > + bars[i].type = VPCI_BAR_IO;
> > + continue;
> > + }
> > + else if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>
> Pointless "else" (twice).
Removed.
> > + PCI_BASE_ADDRESS_MEM_TYPE_64 )
> > + bars[i].type = VPCI_BAR_MEM64_LO;
> > + else
> > + bars[i].type = VPCI_BAR_MEM;
> > +
> > + /* Size the BAR and map it. */
> > + index = i;
> > + rc = pci_size_bar(seg, bus, slot, func, PCI_BASE_ADDRESS_0,
> > num_bars,
> > + &index, &addr, &size);
> > + if ( rc )
> > + {
> > + dprintk(XENLOG_ERR,
> > + "%04x:%02x:%02x.%u: unable to size BAR#%u: %d\n",
> > + seg, bus, slot, func, i, rc);
> > + return rc;
> > + }
> > +
> > + if ( size == 0 )
> > + {
> > + bars[i].type = VPCI_BAR_EMPTY;
> > + continue;
> > + }
> > +
> > + if ( (bars[i].type == VPCI_BAR_MEM && addr == GENMASK(31, 12)) ||
> > + addr == GENMASK(63, 26) )
>
> Where is this 26 coming from?
>
> Perhaps
>
> if ( addr == GENMASK(bars[i].type == VPCI_BAR_MEM ? 31 : 63, 12) )
I'm checking the memory decode bit here instead in order to figure out
if the BAR is not positioned.
> ? Albeit I'm unconvinced GENMASK() is useful to be used here anyway
> (see also below).
Right, regardless of the specific usage above, what would you
recommend regarding the usage of GENMASK?
Julien suggested introducing GENMASK_ULL. Should I go that route, or
introduce something locally for vPCI?
> > + {
> > + /* BAR is not positioned. */
>
> I can't find anything in the standard saying that all-ones upper
> address bits indicate an unassigned BAR. As long as the memory
> decode bit is off, all BARs are to be considered unassigned afaik.
> Furthermore you can't possibly read e.g. 0xfffff000 from a
> 32-bit BAR covering more than 4k.
OK, so I've now changed this to mark the BAR as unset if the memory
decode bit in the command register is not set.
> > + bars[i].unset = true;
> > + ASSERT(is_hardware_domain(pdev->domain));
> > + ASSERT(!(header->command & PCI_COMMAND_MEMORY));
>
> You're asserting guest controlled state here (even if it's Dom0).
>
> > + }
> > +
> > + ASSERT(IS_ALIGNED(addr, PAGE_SIZE));
>
> Urgh (again).
Removed both of the above.
> > --- a/xen/include/xen/vpci.h
> > +++ b/xen/include/xen/vpci.h
> > @@ -50,6 +50,34 @@ int xen_vpci_write(unsigned int seg, unsigned int bus,
> > unsigned int devfn,
> > struct vpci {
> > /* Root pointer for the tree of vPCI handlers. */
> > struct rb_root handlers;
> > +
> > +#ifdef __XEN__
> > + /* Hide the rest of the vpci struct from the user-space test harness.
> > */
> > + struct vpci_header {
> > + /* Cached value of the command register. */
> > + uint16_t command;
> > + /* Information about the PCI BARs of this device. */
> > + struct vpci_bar {
> > + enum {
> > + VPCI_BAR_EMPTY,
> > + VPCI_BAR_IO,
> > + VPCI_BAR_MEM,
>
> MEM32?
Changed.
> > + VPCI_BAR_MEM64_LO,
> > + VPCI_BAR_MEM64_HI,
> > + } type;
> > + /* Hardware address. */
> > + paddr_t paddr;
> > + /* Guest address where the BAR should be mapped. */
> > + paddr_t gaddr;
> > + /* Current guest address where the BAR is mapped. */
> > + paddr_t mapped_addr;
>
> Why do you need to track both "should be" and "is" addresses? Also
> I think all three would more naturally be frame numbers.
I think I can use a single field to store the address.
> > + size_t size;
>
> Is this enough for e.g. ARM32 (remember this is a common
> header)?
No, I've changed it to uint64_t.
> > + unsigned int attributes:4;
>
> ???
Changed this to "bool prefetchable" instead.
> > + bool sizing;
> > + bool unset;
>
> Isn't this redundant with e.g. gaddr (or as per above gfn) being
> INVALID_PADDR (INVALID_GFN)?
Yes, now removed.
> > + } bars[6];
>
> What about the ROM and SR-IOV ones?
I've implemented support for the expansion ROM BAR (which I still need
to figure out how to test), but I would like to defer SR-IOV for later
because it involves a non-trivial amount of work, and with this series
one can already boot a PVH Dom0 (minus SR-IOV of course).
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |