[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated



On Thu, Sep 26, 2019 at 01:33:42PM -0700, Joe Jin wrote:
> On 9/24/19 8:42 AM, Roger Pau Monné wrote:
> > On Fri, Sep 13, 2019 at 09:50:34AM -0700, Joe Jin wrote:
> >> On 9/13/19 3:33 AM, Roger Pau Monné wrote:
> >>> On Thu, Sep 12, 2019 at 11:03:14AM -0700, Joe Jin wrote:
> >>>> With below testcase, guest kernel reported "No irq handler for vector":
> >>>>   1). Passthrough mlx ib VF to 2 pvhvm guests.
> >>>>   2). Start rds-stress between 2 guests.
> >>>>   3). Scale down 2 guests vcpu from 32 to 6 at the same time.
> >>>>
> >>>> Repeat above test several iteration, guest kernel reported "No irq 
> >>>> handler
> >>>> for vector", and IB traffic downed to zero which caused by interrupt 
> >>>> lost.
> >>>>
> >>>> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu,
> >>>> update MSI-X table, enable IRQ. If any new interrupt arrived after
> >>>> local IRQ disabled also before MSI-X table been updated, interrupt still 
> >>>> used old vector and dest cpu info, and when local IRQ enabled again, 
> >>>> interrupt been sent to wrong cpu and vector.
> >>>
> >>> Yes, but that's something Linux shoulkd be able to handle, according
> >>> to your description there's a window where interrupts can be delivered
> >>> to the old CPU, but that's something expected.
> >>
> >> Actually, kernel will check APIC IRR when do migration, if any pending
> >> IRQ, will retrigger IRQ to new destination, but Xen does not set the
> >> bit.
> > 
> > Right, because the interrupt is pending in the PIRR posted descriptor
> > field, it has not yet reached the vlapic IRR.
> > 
> >>>
> >>>>
> >>>> Looks sync PIR to IRR after MSI-X been updated is help for this issue.
> >>>
> >>> AFAICT the sync that you do is still using the old vcpu id, as
> >>> pirq_dpci->gmsi.dest_vcpu_id gets updated a little bit below. I'm
> >>> unsure about why does this help, I would expect the sync between pir
> >>> and irr to happen anyway, and hence I'm not sure why is this helping.
> >>
> >> As my above update, IRQ retriggered by old cpu, so Xen need to set IRR
> >> for old cpu but not of new.
> > 
> > AFAICT you are draining any pending data from the posted interrupt
> > PIRR field into the IRR vlapic field, so that no stale interrupts are
> > left behind after the MSI fields have been updated by the guest. I
> > think this is correct, I wonder however whether you also need to
> > trigger a vcpu re-scheduling (pause/unpause the vpcu), so that pending
> > interrupts in IRR are injected by vmx_intr_assist.
> > 
> > Also, I think you should do this syncing after the pi_update_irte
> > call, or else there's still a window (albeit small) where you can
> > still get posted interrupts delivered to the old vcpu.
> 
> I agree with you that we need to take care of this issue as well.
> 
> I created an additional patch but not tested yet for the test env was
> broken, post here for your comment firstly, I'll update later of test
> result when my test env back:
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 3f43b9d5a9..4596110a17 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -23,6 +23,7 @@
>  #include <xen/irq.h>
>  #include <asm/hvm/irq.h>
>  #include <asm/hvm/support.h>
> +#include <asm/hvm/vmx/vmx.h>

Why do you need this include?

>  #include <asm/io_apic.h>
>  
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
> @@ -443,9 +444,21 @@ int pt_irq_create_bind(
>              hvm_migrate_pirq(pirq_dpci, vcpu);
>  
>          /* Use interrupt posting if it is supported. */
> -        if ( iommu_intpost )
> -            pi_update_irte(vcpu ? &vcpu->arch.hvm.vmx.pi_desc : NULL,
> -                           info, pirq_dpci->gmsi.gvec);
> +        if ( iommu_intpost ) {
> +            unsigned int ndst = APIC_INVALID_DEST;
> +            struct irq_desc *desc;
> +
> +            desc = pirq_spin_lock_irq_desc(info, NULL);
> +            if ( irq_desc ) {
> +                ndst = irq_desc->msi_desc->pi_desc->ndst;

I think this is not correct. NDST is the APIC ID of the physical CPU
used as the destination for notifications, it's not the ID of the
vcpu where the events are to be delivered.

Also, I think I'm still confused by this, I've just realized that the
PI descriptor seems to be moved from one vCPU to another without
clearing PIRR, and hence I'm not sure why you are losing interrupts in
that case. I need to look deeper in order to figure out what's going
on there.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.