[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated



On Thu, Sep 12, 2019 at 11:03:14AM -0700, Joe Jin wrote:
> With below testcase, guest kernel reported "No irq handler for vector":
>   1). Passthrough mlx ib VF to 2 pvhvm guests.
>   2). Start rds-stress between 2 guests.
>   3). Scale down 2 guests vcpu from 32 to 6 at the same time.
> 
> Repeat above test several iteration, guest kernel reported "No irq handler
> for vector", and IB traffic downed to zero which caused by interrupt lost.
> 
> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu,
> update MSI-X table, enable IRQ. If any new interrupt arrived after
> local IRQ disabled also before MSI-X table been updated, interrupt still 
> used old vector and dest cpu info, and when local IRQ enabled again, 
> interrupt been sent to wrong cpu and vector.

Yes, but that's something Linux shoulkd be able to handle, according
to your description there's a window where interrupts can be delivered
to the old CPU, but that's something expected.

> 
> Looks sync PIR to IRR after MSI-X been updated is help for this issue.

AFAICT the sync that you do is still using the old vcpu id, as
pirq_dpci->gmsi.dest_vcpu_id gets updated a little bit below. I'm
unsure about why does this help, I would expect the sync between pir
and irr to happen anyway, and hence I'm not sure why is this helping.

Maybe you need to force such syncing so that no stale pir vector gets
injected later on when the guest assumes the new MSI address has been
successfully written, and no more interrupts should appear on the
previous vCPU?

PIR to IRR sync happens on vmentry, so if the old vCPU doesn't take a
vmentry and drains any pending pir vectors more or less at the same
time as the new MSI address gets written it's possible that such pir
vectors get injected way past the update of the MSI fields.

> BTW, I could not reproduced this issue if I disabled apicv.

IIRC Linux won't route interrupts from non-pv devices over event
channels when apicv is in use.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.