|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated
On 13.09.2019 18:38, Joe Jin wrote:
> Hi Jan,
>
> Thanks for your reply, see my reply in line please.
>
> On 9/13/19 12:14 AM, Jan Beulich wrote:
>> On 12.09.2019 20:03, Joe Jin wrote:
>>> With below testcase, guest kernel reported "No irq handler for vector":
>>> 1). Passthrough mlx ib VF to 2 pvhvm guests.
>>> 2). Start rds-stress between 2 guests.
>>> 3). Scale down 2 guests vcpu from 32 to 6 at the same time.
>>>
>>> Repeat above test several iteration, guest kernel reported "No irq handler
>>> for vector", and IB traffic downed to zero which caused by interrupt lost.
>>>
>>> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu,
>>> update MSI-X table, enable IRQ. If any new interrupt arrived after
>>> local IRQ disabled also before MSI-X table been updated, interrupt still
>>> used old vector and dest cpu info, and when local IRQ enabled again,
>>> interrupt been sent to wrong cpu and vector.
>>>
>>> Looks sync PIR to IRR after MSI-X been updated is help for this issue.
>>
>> I'm having trouble making the connection, which quite possibly simply
>> means the description needs to be further extended: Sync-ing PIR to
>> IRR has nothing to do with a vector change. It would help if nothing
>> else caused this bitmap propagation, and an interrupt was lost (rather
>> than delivered through the wrong vector, or to the wrong CPU).
>> Furthermore with vector and destination being coupled, after a CPU has
>> been offlined this would generally mean
>> - if there was just a single destination permitted, lack of delivery
>> altogether,
>> - if there were multiple destinations permitted, delivery to one of
>> the other CPUs, at which point the vector would still be valid.
>
> When cpu offline on guest kernel, it only migrates IRQs which affinity set
> to the cpu only, if multiple destinations, kernel does not do migration
> which included update msi-x table with new destination also vector.
>
> After IRQ migration, kernel will check all vector's IRR, if APIC IRR
> been set, retrigger the IRQ to new destination. This intend to avoid
> to lost any interrupt.
>
> But on Xen, after msi-x table updated, it never tried to check and notify
> guest kernel there was pending IRQ.
>
>>
>> An interesting aspect would be on which CPU the log message was
>> observed, and how this correlates with the destination sets of the
>> CPUs that have got offlined. From there it would then further be
>> interesting to understand why the interrupt made it to that CPU,
>> since - as said - destination and vector get changed together, and
>> hence with things going wrong it would be of interest to know whether
>> the CPU receiving the IRQ is within the new destination set, or some
>> (random?) other one.
>
> irq_retrigger() been called after kernel updated vector, irq_retrigger()
> will send pending IRQ(s) to new destination.
>
> Here are kernel log when issue happened, guest kernel is 4.1, on 4.14
> guest, it's almost same, but no "(irq -1)" for kernel changes, IRQ
> migrations workflow are same(fixup_irqs()):
>
> Sep 12 20:26:46 localhost kernel: smpboot: CPU 17 is now offline
> Sep 12 20:26:46 localhost kernel: smpboot: CPU 18 is now offline
> Sep 12 20:26:46 localhost kernel: smpboot: CPU 19 is now offline
> Sep 12 20:26:47 localhost kernel: Broke affinity for irq 251
> Sep 12 20:26:47 localhost kernel: do_IRQ: 20.178 No irq handler for vector
> (irq -1)
> Sep 12 20:26:47 localhost kernel: smpboot: CPU 20 is now offline
> Sep 12 20:26:47 localhost kernel: smpboot: CPU 21 is now offline
>
> From above, you can see IRQ sent to cpu 20, which is offlining.
>
> IRQ arrived to the cpu immediately when IRQ enabled, after CPU offlined,
> it prints log "smpboot: CPU 20 is now offline".
>
> Call path in kernel as below:
> cpu_down()
> |-> cpu_down_maps_locked()
> | _cpu_down
> | |-> __stop_machine
> | |-> stop_cpus()
> | |->__stop_cpus()
> | |- queue_stop_cpus_work() ---+
> |-> __cpu_die() |
> |-> pr_info("CPU %u is now offline\n", cpu); |
> +--------------------------------------------------+
> |
> +
> multi_cpu_stop()
> |-> local_save_flags
> |-> take_cpu_down()
> | |-> __cpu_disable
> | |-> smp_ops.cpu_disable = xen_cpu_disable
> | |-> cpu_disable_common
> | |-> fixup_irqs <== IRQ migration.
> |-> local_irq_restore()
Ah yes, this makes sense. You want to extend the description to
reflect some of the further explanation above.
>>> --- a/xen/drivers/passthrough/io.c
>>> +++ b/xen/drivers/passthrough/io.c
>>> @@ -412,6 +412,9 @@ int pt_irq_create_bind(
>>> pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
>>> pirq_dpci->gmsi.gflags = gflags;
>>> }
>>> +
>>> + if ( hvm_funcs.sync_pir_to_irr )
>>> +
>>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>>
>> If the need for this change can be properly explained, then it
>> still wants converting to alternative_vcall() - the the other
>> caller of this hook. Or perhaps even better move vlapic.c's
>> wrapper (suitably renamed) into hvm.h, and use it here.
>
> Yes I agree, I'm not 100% sure, so I set it to RFC.
And btw, please also attach a brief comment here, to clarify
why the syncing is needed precisely at this point.
>> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id
>> (right after your code insertion) allows for the field to be
>> invalid, which I think you need to guard against.
>
> I think you means multiple destination, then it's -1?
The reason for why it might be -1 are irrelevant here, I think.
You need to handle the case both to avoid an out-of-bounds
array access and to make sure an IRR bit wouldn't still get
propagated too late in some special case.
Also - what about the respective other path in the function,
dealing with PT_IRQ_TYPE_PCI and PT_IRQ_TYPE_MSI_TRANSLATE? It
seems to me that there's the same chance of deferring IRR
propagation for too long?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |