|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 6/7] x86/irq: handle moving interrupts in _assign_irq_vector()
On Thu, Jun 13, 2024 at 01:36:55PM +0200, Jan Beulich wrote:
> On 13.06.2024 13:31, Roger Pau Monné wrote:
> > On Thu, Jun 13, 2024 at 10:38:35AM +0200, Jan Beulich wrote:
> >> On 12.06.2024 17:36, Roger Pau Monné wrote:
> >>> On Wed, Jun 12, 2024 at 03:42:58PM +0200, Jan Beulich wrote:
> >>>> On 12.06.2024 12:39, Roger Pau Monné wrote:
> >>>>> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote:
> >>>>>> On 10.06.2024 16:20, Roger Pau Monne wrote:
> >>>>>>> Currently there's logic in fixup_irqs() that attempts to prevent
> >>>>>>> _assign_irq_vector() from failing, as fixup_irqs() is required to
> >>>>>>> evacuate all
> >>>>>>> interrupts from the CPUs not present in the input mask. The current
> >>>>>>> logic in
> >>>>>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that
> >>>>>>> have
> >>>>>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
> >>>>>>>
> >>>>>>> Instead of attempting to fixup the interrupt descriptor in
> >>>>>>> fixup_irqs() so that
> >>>>>>> _assign_irq_vector() cannot fail, introduce logic in
> >>>>>>> _assign_irq_vector()
> >>>>>>> to deal with interrupts that have either
> >>>>>>> move_{in_progress,cleanup_count} set
> >>>>>>> and no remaining online CPUs in ->arch.cpu_mask.
> >>>>>>>
> >>>>>>> If _assign_irq_vector() is requested to move an interrupt in the state
> >>>>>>> described above, first attempt to see if ->arch.old_cpu_mask contains
> >>>>>>> any valid
> >>>>>>> CPUs that could be used as fallback, and if that's the case do move
> >>>>>>> the
> >>>>>>> interrupt back to the previous destination. Note this is easier
> >>>>>>> because the
> >>>>>>> vector hasn't been released yet, so there's no need to allocate and
> >>>>>>> setup a new
> >>>>>>> vector on the destination.
> >>>>>>>
> >>>>>>> Due to the logic in fixup_irqs() that clears offline CPUs from
> >>>>>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes
> >>>>>>> empty) it
> >>>>>>> shouldn't be possible to get into _assign_irq_vector() with
> >>>>>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in
> >>>>>>> ->arch.old_cpu_mask.
> >>>>>>>
> >>>>>>> However if ->arch.move_{in_progress,cleanup_count} is set and the
> >>>>>>> interrupt has
> >>>>>>> also changed affinity, it's possible the members of
> >>>>>>> ->arch.old_cpu_mask are no
> >>>>>>> longer part of the affinity set,
> >>>>>>
> >>>>>> I'm having trouble relating this (->arch.old_cpu_mask related) to ...
> >>>>>>
> >>>>>>> move the interrupt to a different CPU part of
> >>>>>>> the provided mask
> >>>>>>
> >>>>>> ... this (->arch.cpu_mask related).
> >>>>>
> >>>>> No, the "provided mask" here is the "mask" parameter, not
> >>>>> ->arch.cpu_mask.
> >>>>
> >>>> Oh, so this describes the case of "hitting" the comment at the very
> >>>> bottom of
> >>>> the first hunk then? (I probably was misreading this because I was
> >>>> expecting
> >>>> it to describe a code change, rather than the case where original
> >>>> behavior
> >>>> needs retaining. IOW - all fine here then.)
> >>>>
> >>>>>>> and keep the current ->arch.old_{cpu_mask,vector} for the
> >>>>>>> pending interrupt movement to be completed.
> >>>>>>
> >>>>>> Right, that's to clean up state from before the initial move. What
> >>>>>> isn't
> >>>>>> clear to me is what's to happen with the state of the intermediate
> >>>>>> placement. Description and code changes leave me with the impression
> >>>>>> that
> >>>>>> it's okay to simply abandon, without any cleanup, yet I can't quite
> >>>>>> figure
> >>>>>> why that would be an okay thing to do.
> >>>>>
> >>>>> There isn't much we can do with the intermediate placement, as the CPU
> >>>>> is going offline. However we can drain any pending interrupts from
> >>>>> IRR after the new destination has been set, since setting the
> >>>>> destination is done from the CPU that's the current target of the
> >>>>> interrupts. So we can ensure the draining is done strictly after the
> >>>>> target has been switched, hence ensuring no further interrupts from
> >>>>> this source will be delivered to the current CPU.
> >>>>
> >>>> Hmm, I'm afraid I still don't follow: I'm specifically in trouble with
> >>>> the ...
> >>>>
> >>>>>>> --- a/xen/arch/x86/irq.c
> >>>>>>> +++ b/xen/arch/x86/irq.c
> >>>>>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc
> >>>>>>> *desc, const cpumask_t *mask)
> >>>>>>> }
> >>>>>>>
> >>>>>>> if ( desc->arch.move_in_progress ||
> >>>>>>> desc->arch.move_cleanup_count )
> >>>>>>> - return -EAGAIN;
> >>>>>>> + {
> >>>>>>> + /*
> >>>>>>> + * If the current destination is online refuse to shuffle.
> >>>>>>> Retry after
> >>>>>>> + * the in-progress movement has finished.
> >>>>>>> + */
> >>>>>>> + if ( cpumask_intersects(desc->arch.cpu_mask,
> >>>>>>> &cpu_online_map) )
> >>>>>>> + return -EAGAIN;
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * Due to the logic in fixup_irqs() that clears offlined
> >>>>>>> CPUs from
> >>>>>>> + * ->arch.old_cpu_mask it shouldn't be possible to get here
> >>>>>>> with
> >>>>>>> + * ->arch.move_{in_progress,cleanup_count} set and no online
> >>>>>>> CPUs in
> >>>>>>> + * ->arch.old_cpu_mask.
> >>>>>>> + */
> >>>>>>> + ASSERT(valid_irq_vector(desc->arch.old_vector));
> >>>>>>> + ASSERT(cpumask_intersects(desc->arch.old_cpu_mask,
> >>>>>>> &cpu_online_map));
> >>>>>>> +
> >>>>>>> + if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
> >>>>>>> + {
> >>>>>>> + /*
> >>>>>>> + * Fallback to the old destination if moving is in
> >>>>>>> progress and the
> >>>>>>> + * current destination is to be offlined. This is only
> >>>>>>> possible if
> >>>>>>> + * the CPUs in old_cpu_mask intersect with the affinity
> >>>>>>> mask passed
> >>>>>>> + * in the 'mask' parameter.
> >>>>>>> + */
> >>>>>>> + desc->arch.vector = desc->arch.old_vector;
> >>>>>>> + cpumask_and(desc->arch.cpu_mask,
> >>>>>>> desc->arch.old_cpu_mask, mask);
> >>>>
> >>>> ... replacing of vector (and associated mask), without any further
> >>>> accounting.
> >>>
> >>> It's quite likely I'm missing something here, but what further
> >>> accounting you would like to do?
> >>>
> >>> The current target of the interrupt (->arch.cpu_mask previous to
> >>> cpumask_and()) is all going offline, so any attempt to set it in
> >>> ->arch.old_cpu_mask would just result in a stale (offline) CPU getting
> >>> set in ->arch.old_cpu_mask, which previous patches attempted to
> >>> solve.
> >>>
> >>> Maybe by "further accounting" you meant something else not related to
> >>> ->arch.old_{cpu_mask,vector}?
> >>
> >> Indeed. What I'm thinking of is what normally release_old_vec() would
> >> do (of which only desc->arch.used_vectors updating would appear to be
> >> relevant, seeing the CPU's going offline). The other one I was thinking
> >> of, updating vector_irq[], likely is also unnecessary, again because
> >> that's per-CPU data of a CPU going down.
> >
> > I think updating vector_irq[] should be explicitly avoided, as doing
> > so would prevent us from correctly draining any pending interrupts
> > because the vector -> irq mapping would be broken when the interrupt
> > enable window at the bottom of fixup_irqs() is reached.
> >
> > For used_vectors: we might clean it, I'm a bit worried however that at
> > some point we insert a check in do_IRQ() path that ensures the
> > vector_irq[] is inline with desc->arch.used_vectors, which would fail
> > for interrupts drained at the bottom of fixup_irqs(). Let me attempt
> > to clean the currently used vector from ->arch.used_vectors.
>
> Just to clarify: It may well be that for draining the bit can't be cleared
> right here. But it then still needs clearing _somewhere_, or else we
> chance ending up with inconsistent state (triggering e.g. an assertion
> later on) or the leaking of vectors. My problem here was that I also
> couldn't locate any such "somewhere", and commentary also didn't point me
> anywhere.
You are correct, there's no such place where the cleanup would happen.
I'm afraid the only option I see to correctly deal with this is to do
the cleanup of the old destination in _assign_irq_vector(), and then
do the pending interrupt draining from IRR like I had proposed in
patch 7/7, thus removing the interrupt enable window at the bottom of
fixup_irqs().
Let me know if that seems sensible.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |