Xen project Mailing List

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated

Date: Mon, 16 Sep 2019 10:01:56 +0200

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Joao Martins <joao.m.martins@xxxxxxxxxx>, "DONGLI.ZHANG" <dongli.zhang@xxxxxxxxxx>

Delivery-date: Mon, 16 Sep 2019 08:02:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.09.2019 18:38, Joe Jin wrote: > Hi Jan, > > Thanks for your reply, see my reply in line please. > > On 9/13/19 12:14 AM, Jan Beulich wrote: >> On 12.09.2019 20:03, Joe Jin wrote: >>> With below testcase, guest kernel reported "No irq handler for vector": >>> 1). Passthrough mlx ib VF to 2 pvhvm guests. >>> 2). Start rds-stress between 2 guests. >>> 3). Scale down 2 guests vcpu from 32 to 6 at the same time. >>> >>> Repeat above test several iteration, guest kernel reported "No irq handler >>> for vector", and IB traffic downed to zero which caused by interrupt lost. >>> >>> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu, >>> update MSI-X table, enable IRQ. If any new interrupt arrived after >>> local IRQ disabled also before MSI-X table been updated, interrupt still >>> used old vector and dest cpu info, and when local IRQ enabled again, >>> interrupt been sent to wrong cpu and vector. >>> >>> Looks sync PIR to IRR after MSI-X been updated is help for this issue. >> >> I'm having trouble making the connection, which quite possibly simply >> means the description needs to be further extended: Sync-ing PIR to >> IRR has nothing to do with a vector change. It would help if nothing >> else caused this bitmap propagation, and an interrupt was lost (rather >> than delivered through the wrong vector, or to the wrong CPU). >> Furthermore with vector and destination being coupled, after a CPU has >> been offlined this would generally mean >> - if there was just a single destination permitted, lack of delivery >> altogether, >> - if there were multiple destinations permitted, delivery to one of >> the other CPUs, at which point the vector would still be valid. > > When cpu offline on guest kernel, it only migrates IRQs which affinity set > to the cpu only, if multiple destinations, kernel does not do migration > which included update msi-x table with new destination also vector. > > After IRQ migration, kernel will check all vector's IRR, if APIC IRR > been set, retrigger the IRQ to new destination. This intend to avoid > to lost any interrupt. > > But on Xen, after msi-x table updated, it never tried to check and notify > guest kernel there was pending IRQ. > >> >> An interesting aspect would be on which CPU the log message was >> observed, and how this correlates with the destination sets of the >> CPUs that have got offlined. From there it would then further be >> interesting to understand why the interrupt made it to that CPU, >> since - as said - destination and vector get changed together, and >> hence with things going wrong it would be of interest to know whether >> the CPU receiving the IRQ is within the new destination set, or some >> (random?) other one. > > irq_retrigger() been called after kernel updated vector, irq_retrigger() > will send pending IRQ(s) to new destination. > > Here are kernel log when issue happened, guest kernel is 4.1, on 4.14 > guest, it's almost same, but no "(irq -1)" for kernel changes, IRQ > migrations workflow are same(fixup_irqs()): > > Sep 12 20:26:46 localhost kernel: smpboot: CPU 17 is now offline > Sep 12 20:26:46 localhost kernel: smpboot: CPU 18 is now offline > Sep 12 20:26:46 localhost kernel: smpboot: CPU 19 is now offline > Sep 12 20:26:47 localhost kernel: Broke affinity for irq 251 > Sep 12 20:26:47 localhost kernel: do_IRQ: 20.178 No irq handler for vector > (irq -1) > Sep 12 20:26:47 localhost kernel: smpboot: CPU 20 is now offline > Sep 12 20:26:47 localhost kernel: smpboot: CPU 21 is now offline > > From above, you can see IRQ sent to cpu 20, which is offlining. > > IRQ arrived to the cpu immediately when IRQ enabled, after CPU offlined, > it prints log "smpboot: CPU 20 is now offline". > > Call path in kernel as below: > cpu_down() > |-> cpu_down_maps_locked() > | _cpu_down > | |-> __stop_machine > | |-> stop_cpus() > | |->__stop_cpus() > | |- queue_stop_cpus_work() ---+ > |-> __cpu_die() | > |-> pr_info("CPU %u is now offline\n", cpu); | > +--------------------------------------------------+ > | > + > multi_cpu_stop() > |-> local_save_flags > |-> take_cpu_down() > | |-> __cpu_disable > | |-> smp_ops.cpu_disable = xen_cpu_disable > | |-> cpu_disable_common > | |-> fixup_irqs <== IRQ migration. > |-> local_irq_restore() Ah yes, this makes sense. You want to extend the description to reflect some of the further explanation above. >>> --- a/xen/drivers/passthrough/io.c >>> +++ b/xen/drivers/passthrough/io.c >>> @@ -412,6 +412,9 @@ int pt_irq_create_bind( >>> pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec; >>> pirq_dpci->gmsi.gflags = gflags; >>> } >>> + >>> + if ( hvm_funcs.sync_pir_to_irr ) >>> + >>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]); >> >> If the need for this change can be properly explained, then it >> still wants converting to alternative_vcall() - the the other >> caller of this hook. Or perhaps even better move vlapic.c's >> wrapper (suitably renamed) into hvm.h, and use it here. > > Yes I agree, I'm not 100% sure, so I set it to RFC. And btw, please also attach a brief comment here, to clarify why the syncing is needed precisely at this point. >> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id >> (right after your code insertion) allows for the field to be >> invalid, which I think you need to guard against. > > I think you means multiple destination, then it's -1? The reason for why it might be -1 are irrelevant here, I think. You need to handle the case both to avoid an out-of-bounds array access and to make sure an IRR bit wouldn't still get propagated too late in some special case. Also - what about the respective other path in the function, dealing with PT_IRQ_TYPE_PCI and PT_IRQ_TYPE_MSI_TRANSLATE? It seems to me that there's the same chance of deferring IRR propagation for too long? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.