[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 07/11] vpci/bars: add handlers to map the BARs



On Mon, Feb 26, 2018 at 04:20:17AM -0700, Jan Beulich wrote:
> (re-sending with xen-devel re-added; not sure how it got lost)
> 
> >>> On 23.01.18 at 16:07, <roger.pau@xxxxxxxxxx> wrote:
> > ---
> >  tools/tests/vpci/emul.h   |   1 +
> >  xen/arch/x86/hvm/ioreq.c  |   4 +
> 
> Again the Cc to Paul is missing (no matter that it's just a tiny change).

Sorry, I will run get_maintainer.pl against the whole patchset before
sending a new version.

> > +static int map_range(unsigned long s, unsigned long e, void *data,
> > +                     unsigned long *c)
> > +{
> > +    const struct map_data *map = data;
> > +    int rc;
> > +
> > +    for ( ; ; )
> > +    {
> > +        unsigned long size = e - s + 1;
> 
> Considering the lack of a condition in the for() and the inclusiveness
> of the range (which means you can't express an empty range) I don't
> understand how ...
> 
> > +        /*
> > +         * ARM TODOs:
> > +         * - On ARM whether the memory is prefetchable or not should be 
> > passed
> > +         *   to map_mmio_regions in order to decide which memory attributes
> > +         *   should be used.
> > +         *
> > +         * - {un}map_mmio_regions doesn't support preemption, hence the 
> > bodge
> > +         *   below in order to limit the amount of mappings to 64 pages for
> > +         *   each function call.
> > +         */
> > +
> > +#ifdef CONFIG_ARM
> > +        size = min(64ul, size);
> > +#endif
> > +
> > +        rc = (map->map ? map_mmio_regions : unmap_mmio_regions)
> > +             (map->d, _gfn(s), size, _mfn(s));
> > +        if ( rc == 0 )
> > +        {
> > +            *c += size;
> > +#ifdef CONFIG_ARM
> > +            rc = -ERESTART;
> > +#endif
> > +            break;
> 
> ... this works in the ARM case. If the whole thing doesn't work (and
> doesn't get built) for ARM right now, make this obvious by one or more
> #error directives.

ARM will never iterate, instead it will rely on always returning
-ERESTART and letting the caller of rangeset_consume_ranges deal with
it. What's wrong here is that the call to rangeset_consume_ranges
should be:

while ( rangeset_consume_ranges(...) == -ERESTART );

But that makes the code even more convoluted than what it already is.
I've basically added the ARM part because Julien requested it, but I
think the right way to fix this is to unify the behaviour of
{un}map_mmio_regions between x86 and ARM.

I will drop the ARM chunks.

> > +static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom)
> > +{
> > +    struct vpci_header *header = &pdev->vpci->header;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> > +    {
> > +        if ( rom && header->bars[i].type == VPCI_BAR_ROM )
> > +        {
> > +            unsigned int rom_pos = (i == 6) ? PCI_ROM_ADDRESS
> 
> I probably should have mentioned this earlier, but I'm really unhappy
> about such literal "magic" numbers. Please use a suitable expression
> or #define.
> 
> > +bool vpci_process_pending(struct vcpu *v)
> > +{
> > +    while ( v->vpci.mem )
> > +    {
> > +        struct map_data data = {
> > +            .d = v->domain,
> > +            .map = v->vpci.map,
> > +        };
> > +
> > +        switch ( rangeset_consume_ranges(v->vpci.mem, map_range, &data) )
> > +        {
> > +        case -ERESTART:
> > +            return true;
> > +
> > +        default:
> > +            if ( v->vpci.map )
> > +            {
> > +                spin_lock(&v->vpci.pdev->vpci->lock);
> > +                modify_decoding(v->vpci.pdev, v->vpci.map, v->vpci.rom);
> > +                spin_unlock(&v->vpci.pdev->vpci->lock);
> > +            }
> > +            /* fallthrough. */
> > +        case -ENOMEM:
> 
> You carefully handle this error here.

On second thought, I'm not sure handling ENOMEM separately makes
sense. Unless you object I plan to remove this special casing.

> > +static void maybe_defer_map(struct domain *d, const struct pci_dev *pdev,
> > +                            struct rangeset *mem, bool map, bool rom)
> > +{
> > +    struct vcpu *curr = current;
> > +
> > +    if ( is_idle_vcpu(curr) )
> > +    {
> > +        struct map_data data = { .d = d, .map = true };
> > +
> > +        /*
> > +         * Only used for domain construction in order to map the BARs
> > +         * of devices with memory decoding enabled.
> > +         */
> > +        ASSERT(map && !rom);
> > +        rangeset_consume_ranges(mem, map_range, &data);
> 
> What if this produces -ENOMEM? And despite having looked at
> several revisions of this, I can't make the connection to why this
> is in an is_idle_vcpu() conditional (neither the direct caller nor the
> next level up make this obvious to me). There's clearly a need
> for extending the comment.

I thought the comment above that mentions domain construction would be
enough. I can try to expand this, maybe like:

"This function will only be called from the idle vCPU while building
the domain, in which case it's not possible to defer the operation
(like done in the else branch). Call rangeset_consume_ranges in order
to establish the mappings right away."

> > +static void modify_bars(const struct pci_dev *pdev, bool map, bool rom)
> > +{
> > +    struct vpci_header *header = &pdev->vpci->header;
> > +    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
> > +    const struct pci_dev *tmp;
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    if ( !map )
> > +        modify_decoding(pdev, false, rom);
> > +
> > +    if ( !mem )
> > +        return;
> 
> Similarly here - why is it okay (or what effect will it have) to return
> here when we're out of memory, but the caller won't know?

The behaviour here depends on the change to the memory decoding bit:

 - Clearing: memory decoding on device will be disabled, BARs won't be
   unmapped.
 - Setting: no change to device memory decoding bit, BARs won't be
   mapped.

Do you think this is suitable? IMO it's fine to disable the memory
decoding bit on the device and leave the memory regions mapped.

> > +    /*
> > +     * Create a rangeset that represents the current device BARs memory 
> > region
> > +     * and compare it against all the currently active BAR memory regions. 
> > If
> > +     * an overlap is found, subtract it from the region to be
> > +     * mapped/unmapped.
> > +     *
> > +     * NB: the rangeset uses inclusive frame numbers.
> > +     */
> > +
> > +    /*
> > +     * First fill the rangeset with all the BARs of this device or with 
> > the ROM
> > +     * BAR only, depending on whether the guest is toggling the memory 
> > decode
> > +     * bit of the command register, or the enable bit of the ROM BAR 
> > register.
> > +     */
> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> > +    {
> > +        const struct vpci_bar *bar = &header->bars[i];
> > +
> > +        if ( !MAPPABLE_BAR(bar) ||
> > +             (rom ? bar->type != VPCI_BAR_ROM
> > +                  : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
> > +            continue;
> 
> Why does rom_enabled matter for the !rom case rather than for
> the rom one? I.e.
> 
>         if ( !MAPPABLE_BAR(bar) ||
>              (rom ? bar->type != VPCI_BAR_ROM || !header->rom_enabled
>                   : bar->type == VPCI_BAR_ROM) )
>             continue;
> 
> ?

No, for the ROM case we only want to map/unmap the ROM, so that's the
only thing added to the rangeset. For the non-ROM case Xen will also
map/unmap the ROM if the enable bit is set.

Your proposed code would always map/unmap the ROM into the p2m when
the memory decoding bit is changed even if it's not enabled.

> > +        rc = rangeset_add_range(mem, PFN_DOWN(bar->addr),
> > +                                PFN_UP(bar->addr + bar->size - 1));
> > +        if ( rc )
> > +        {
> > +            printk(XENLOG_G_WARNING
> > +                   "Failed to add [%" PRI_gfn ", %" PRI_gfn "): %d\n",
> > +                   PFN_DOWN(bar->addr), PFN_UP(bar->addr + bar->size - 1),
> 
> Either you use [a,b) and don't subtract 1, or you use [a,b] with the
> subtraction. Same below.

[a, b) seems better in order to avoid the -1.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.