[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 for-next 2/3] x86/pt: enable binding of GSIs to a PVH Dom0



On Tue, May 30, 2017 at 04:01:00AM -0600, Jan Beulich wrote:
> >>> On 17.05.17 at 17:15, <roger.pau@xxxxxxxxxx> wrote:
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > @@ -126,6 +126,49 @@ void hvm_pci_intx_deassert(
> >      spin_unlock(&d->arch.hvm_domain.irq_lock);
> >  }
> >  
> > +void hvm_gsi_assert(struct domain *d, unsigned int gsi)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +
> > +    if ( gsi >= hvm_irq->nr_gsis )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    /*
> > +     * __hvm_pci_intx_{de}assert uses an array to track the status of each
> > +     * interrupt line, and Xen does the routing and GSI assertion based on
> > +     * that. This prevents the same line from triggering multiple times, 
> > which
> > +     * is not available here, and thus Xen needs to rely on 
> > gsi_assert_count in
> > +     * order to know if the GSI is pending or not.
> > +     */
> 
> The "which is not available here" part is at least confusing. I'm not
> even sure whether the "which" is supposed to refer to the array or
> something else, because you use the exact same array here.

I agree, it's confusing and badly worded, how about:

__hvm_pci_intx_{de}assert uses a bitfield in pci_intx.i to track the
status of each interrupt line, and Xen does the routing and GSI
assertion based on that. The value of the pci_intx.i bitmap prevents
the same line from triggering multiple times, which is not available
here, and thus Xen needs to rely on gsi_assert_count in order to know
if the GSI is pending or not.

> > +void hvm_gsi_deassert(struct domain *d, unsigned int gsi)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +
> > +    if ( gsi >= hvm_irq->nr_gsis )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    spin_lock(&d->arch.hvm_domain.irq_lock);
> > +    if ( hvm_irq->gsi_assert_count[gsi] )
> > +        hvm_irq->gsi_assert_count[gsi] = 0;
> > +    ASSERT(!hvm_irq->gsi_assert_count[gsi]);
> 
> I don't think the if() and ASSERT() are of any use here anymore.

Fixed.

> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -164,6 +164,20 @@ static void pt_irq_time_out(void *data)
> >  
> >      spin_lock(&irq_map->dom->event_lock);
> >  
> > +    if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI )
> > +    {
> > +        struct pirq *pirq = dpci_pirq(irq_map);
> > +
> > +        ASSERT(is_hardware_domain(irq_map->dom));
> > +        /*
> > +         * Identity mapped, no need to iterate over the guest GSI list to 
> > find
> > +         * other pirqs sharing the same guest GSI.
> > +         */
> > +        irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
> > +        hvm_gsi_deassert(irq_map->dom, pirq->pirq);
> > +        goto out;
> > +    }
> > +
> >      dpci = domain_get_irq_dpci(irq_map->dom);
> >      if ( unlikely(!dpci) )
> >      {
> > @@ -185,6 +199,7 @@ static void pt_irq_time_out(void *data)
> >          hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
> >      }
> >  
> > + out:
> >      pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
> 
> With the 1:1 mapping, do you really need to go through
> pt_pirq_iterate() here? I.e. can't you invoke pt_irq_guest_eoi()
> just once and be done? Or really it probably can be even more
> straight, as then there also is no point in setting the
> HVM_IRQ_DPCI_EOI_LATCH flag, but you could rather do
> directly what pt_irq_guest_eoi() does in its if() body. Otoh I may
> be missing something here, as I can't see why the code is using
> pt_pirq_iterate() even before your change.

I have to admit this is not obviously clear to me (or I'm also missing
something), there are too many translation layers and indirections in
this code, together with a complete lack of comments.

Previous to my change, pt_irq_time_out iterates over the list of guest
devices (digl) bound to a pirq, desserts the interrupt lines and marks
all the pirqs bound to the same guest GSI with the EOI_LATCH flag.
Finally pt_irq_time_out iterates over the list of guest pirqs and
clears (EOI) all the ones marked as EOI_LATCH.

I don't really understand the usefulness of the EOI_LATCH flag, can't
pt_irq_time_out just call pt_irq_guest_eoi and avoid the iteration?
Something like:

list_for_each_entry ( digl, &irq_map->digl_list, list )
{
    unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
    const struct hvm_girq_dpci_mapping *girq;

    list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
    {
        struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);

        pirq_dpci(pirq)->masked = 0;
        pirq_dpci(pirq)->pending = 0;
        pirq_guest_eoi(dpci_pirq(pirq));
    }
    hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
}

I can of course also do something similar for the identity mapping
case.

> > @@ -472,7 +510,27 @@ int pt_irq_create_bind(
> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> >                                     HVM_IRQ_DPCI_MACH_PCI |
> >                                     HVM_IRQ_DPCI_GUEST_PCI;
> > -                share = BIND_PIRQ__WILL_SHARE;
> > +                if ( !is_hardware_domain(d) )
> > +                    share = BIND_PIRQ__WILL_SHARE;
> > +                else
> > +                {
> > +                    unsigned int pin;
> > +                    struct hvm_vioapic *vioapic = gsi_vioapic(d, guest_gsi,
> > +                                                              &pin);
> > +
> > +                    if ( !vioapic )
> > +                    {
> > +                        ASSERT_UNREACHABLE();
> > +                        return -EINVAL;
> > +                    }
> > +                    pirq_dpci->flags |= HVM_IRQ_DPCI_IDENTITY_GSI;
> > +                    /*
> > +                     * Check if the corresponding vIO APIC pin is 
> > configured
> > +                     * level or edge trigger, level triggered interrupts 
> > will
> > +                     * be marked as shareable.
> > +                     */
> > +                    share = vioapic->redirtbl[pin].fields.trig_mode;
> 
> As pointed out during prior review (perhaps of another patch of
> yours) the trigger mode bit is meaningless for masked RTEs. At
> the very least an ASSERT() needs to be here for that reason,
> of course provided masked entries can never be seen here.

pt_irq_create_bind for GSIs will only be called when the hardware
domain unmasks the RTE, so an ASSERT is the right choice IMHO.

> > @@ -489,9 +547,15 @@ int pt_irq_create_bind(
> >                   * IRQ_GUEST is not set. As such we can reset 'dom' 
> > directly.
> >                   */
> >                  pirq_dpci->dom = NULL;
> > -                list_del(&girq->list);
> > -                list_del(&digl->list);
> > -                hvm_irq_dpci->link_cnt[link]--;
> > +                if ( !is_hardware_domain(d) )
> 
> To be honest I'd prefer if you checked digl and/or girq against NULL
> here, to avoid someone updating the condition above without
> updating this one in lock step.

I've changed the condition to "girq && digl".

> > @@ -573,27 +644,30 @@ int pt_irq_destroy_bind(
> >          struct hvm_girq_dpci_mapping *girq;
> >          struct dev_intx_gsi_link *digl, *tmp;
> >  
> > -        list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
> > +        if ( hvm_irq_dpci )
> >          {
> > -            if ( girq->bus         == bus &&
> > -                 girq->device      == device &&
> > -                 girq->intx        == intx &&
> > -                 girq->machine_gsi == machine_gsi )
> > +            list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], 
> > list )
> >              {
> > -                list_del(&girq->list);
> > -                xfree(girq);
> > -                girq = NULL;
> > -                break;
> > +                if ( girq->bus         == bus &&
> > +                     girq->device      == device &&
> > +                     girq->intx        == intx &&
> > +                     girq->machine_gsi == machine_gsi )
> > +                {
> > +                    list_del(&girq->list);
> > +                    xfree(girq);
> > +                    girq = NULL;
> > +                    break;
> > +                }
> >              }
> > -        }
> >  
> > -        if ( girq )
> > -        {
> > -            spin_unlock(&d->event_lock);
> > -            return -EINVAL;
> > -        }
> > +            if ( girq )
> > +            {
> > +                spin_unlock(&d->event_lock);
> > +                return -EINVAL;
> > +            }
> >  
> > -        hvm_irq_dpci->link_cnt[link]--;
> > +            hvm_irq_dpci->link_cnt[link]--;
> > +        }
> >  
> >          /* clear the mirq info */
> >          if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> 
> What you leave untouched here is code freeing something you
> never allocate for Dom0. While likely this is correct (as the list will
> always be empty), it is confusing at the same time. Furthermore
> if you also skip that part, I think you can avoid having to re-indent
> all the code further up.

I see what you mean, the hvm_irq_dpci condition can be moved to the
upper if, since none of what's inside this condition is relevant for
the hardware domain case, and will avoid this re-indentation.

> > @@ -638,11 +712,15 @@ int pt_irq_destroy_bind(
> >      if ( what && iommu_verbose )
> >      {
> >          unsigned int device = pt_irq_bind->u.pci.device;
> > +        char buf[24] = "";
> >  
> > -        printk(XENLOG_G_INFO
> > -               "d%d %s unmap: m_irq=%u dev=%02x:%02x.%u intx=%u\n",
> > -               d->domain_id, what, machine_gsi, pt_irq_bind->u.pci.bus,
> > -               PCI_SLOT(device), PCI_FUNC(device), 
> > pt_irq_bind->u.pci.intx);
> > +        if ( !is_hardware_domain(d) )
> > +            snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u",
> > +                     pt_irq_bind->u.pci.bus, PCI_SLOT(device),
> > +                     PCI_FUNC(device), pt_irq_bind->u.pci.intx);
> 
> Now that this supports Dom0, you also need to log the segment.

I'm not sure I follow you here, for the Dom case all the fields in
u.pci.* are unused, since Xen does an identity mapping of the GSI, but
it doesn't know to which device it belongs. That's different for the
MSI case, but then this fields are not used anyway.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.