[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 07/11] vpci/bars: add handlers to map the BARs



On Thu, Mar 15, 2018 at 06:41:00AM -0600, Jan Beulich wrote:
> >>> On 15.03.18 at 12:33, <roger.pau@xxxxxxxxxx> wrote:
> > On Wed, Mar 14, 2018 at 10:13:16AM -0600, Jan Beulich wrote:
> >> >>> On 14.03.18 at 15:04, <roger.pau@xxxxxxxxxx> wrote:
> >> > +static int maybe_defer_map(struct domain *d, struct pci_dev *pdev,
> >> > +                           struct rangeset *mem, bool map, bool rom)
> >> > +{
> >> > +    struct vcpu *curr = current;
> >> > +    int rc;
> >> > +
> >> > +    if ( is_idle_vcpu(curr) )
> >> > +    {
> >> > +        struct map_data data = { .d = d, .map = true };
> >> > +
> >> > +        /*
> >> > +         * Dom0 building runs on the idle vCPU, in which case it's not 
> >> > possible
> >> > +         * to defer the operation (like done in the else branch). Call
> >> > +         * rangeset_consume_ranges in order to establish the mappings 
> >> > right
> >> > +         * away.
> >> > +         */
> >> 
> >> For one I think this comment belongs above the if(), as that's what
> >> it explains, not the ASSERT() that follows. And then it clarifies only
> >> half of what needs clarifying: Why can't we make it here on an idle
> >> vCPU outside of Dom0 building (e.g. through a tasklet), or if we can,
> >> why is the given behavior the intended one?
> > 
> > Since this seems to be causing confusion, what about using:
> > 
> > system_state != SYS_STATE_active
> > 
> > Instead of checking if running on the idle vpcu. Do you think that
> > would make it clearer?
> 
> Yes, I think so. That would then make clear that if you moved the
> conditional into the only caller and split the function, the Dom0 one
> could even become __init.

I've splitted the function into defer_map and apply_map, and added the
following at the end of modify_bars:

if ( system_state != SYS_STATE_active )
{
    /*
     * Mappings might be created when building Dom0 if the memory decoding
     * bit of PCI devices is enabled. In that case it's not possible to
     * defer the operation, so call apply_map in order to create the
     * mappings right away. Note that at build time this function will only
     * be called iff the memory decoding bit is enabled, thus the operation
     * will always be to establish mappings and process all the BARs.
     */
    ASSERT(map && !rom_only);
    return apply_map(pdev->domain, dev, mem);
}

defer_map(pdev->domain, dev, mem, map, rom_only);


Would you also like me to change the is_idle_vcpu check in map_range
to use system_state == SYS_STATE_active also?

> >> > +        ASSERT(map && !rom);
> >> 
> >> I can see why you assume it's not an un-mapping request (albeit
> >> I wonder whether you couldn't instead simply set .map above to
> >> the incoming value), but why the !rom part?
> > 
> > This branch will only be used at Dom0 build time, when none of the
> > BARs are mapped into the p2m, so asking for an unmap in this case
> > would be wrong.
> 
> Yes, that's the part I've said I understand, yet was saying that the
> code wouldn't become incorrect if you set .map to map. You don't
> explain the !rom part at all, though.

Let's see if the comment above clarifies a little bit the non deferred
map.

> >> > +     */
> >> > +
> >> > +    /*
> >> > +     * First fill the rangeset with all the BARs of this device or with 
> >> > the ROM
> >> > +     * BAR only, depending on whether the guest is toggling the memory 
> >> > decode
> >> > +     * bit of the command register, or the enable bit of the ROM BAR 
> >> > register.
> >> > +     */
> >> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> >> > +    {
> >> > +        const struct vpci_bar *bar = &header->bars[i];
> >> > +
> >> > +        if ( !MAPPABLE_BAR(bar) ||
> >> > +             (rom_only ? bar->type != VPCI_BAR_ROM
> >> > +                       : (bar->type == VPCI_BAR_ROM && 
> >> > !header->rom_enabled)) )
> >> > +            continue;
> >> > +
> >> > +        rc = rangeset_add_range(mem, PFN_DOWN(bar->addr),
> >> > +                                PFN_UP(bar->addr + bar->size - 1));
> >> > +        if ( rc )
> >> > +        {
> >> > +            printk(XENLOG_G_WARNING
> >> > +                   "Failed to add [%" PRI_gfn ", %" PRI_gfn "]: %d\n",
> >> > +                   PFN_DOWN(bar->addr), PFN_UP(bar->addr + bar->size - 
> >> > 1),
> >> 
> >> I thought we had agreed that the parenthesization of tuples
> >> like this one should match meaning they want to convey. I'm
> >> having a hard time to see how PFN_UP() could ever go together
> >> with a closing square bracket.
> > 
> > There's a -1 in the PFN_UP, and it's exactly what we are adding to the
> > rangeset. Ie: rangeset_add_range(..., e, e) is adding the range [e,
> > e], not [e, e).
> 
> Oh, I didn't spot that - it's even worse then afaict, because you
> then also have the insertion wrong. Just consider a BAR covering
> up to a page and starting at a page boundary: You'd insert two
> pages when you mean just one.

Ug, right. This should be PFN_DOWN and then the range added and the ]
notation used is correct.

> >> > +static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >> > +                      uint32_t cmd, void *data)
> >> > +{
> >> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> >> > +    uint16_t current_cmd = pci_conf_read16(pdev->seg, pdev->bus, slot, 
> >> > func,
> >> > +                                           reg);
> >> > +
> >> > +    /*
> >> > +     * Let Dom0 play with all the bits directly except for the memory
> >> > +     * decoding one.
> >> > +     */
> >> > +    if ( (cmd ^ current_cmd) & PCI_COMMAND_MEMORY )
> >> > +        /*
> >> > +         * Ignore the error. No memory has been added or removed from 
> >> > the p2m,
> >> > +         * and the memory decoding has not been changed, so leave 
> >> > everything
> >> > +         * as-is, hoping the guest will realize and try again.
> >> > +         */
> >> > +        modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
> >> 
> >> So that comment appears to be correct, but I wonder if the reader
> >> could get a little more assistance, as it's not exactly obvious why no
> >> p2m changes would have occurred in case of failure: modify_bars()
> >> produces all its errors before doing any mapping, and
> >> maybe_defer_map() takes the "else" branch which doesn't cause
> >> any (direct) errors. Same for the similar comment in rom_write().
> > 
> > Let me expand that a little bit then to give some more context:
> > 
> > /*
> >  * Ignore the error. No memory has been added or removed from the p2m
> >  * (because the actual p2m changes are deferred in maybe_defer_map)
> >  * and the memory decoding bit has not been changed, so leave
> >  * everything as-is, hoping the guest will realize and try again.
> >  */
> 
> That's better, but with the suggested split of maybe_defer_map()
> things would end up even less confusing (because the new text
> you suggest still has an apparent [but not actual] conflict between
> the "maybe" in the function name and what you say is happening).

Let's see if the proposed change/comment makes this easier to
understand.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.