Xen project Mailing List

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Thu, 25 May 2023 12:54:46 -0700 (PDT)

Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, xenia.ragiadakou@xxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 25 May 2023 19:55:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, 25 May 2023, Jan Beulich wrote: > On 25.05.2023 01:51, Stefano Stabellini wrote: > > xen/irq: fix races between send_cleanup_vector and _clear_irq_vector > > This title is, I'm afraid, already misleading. No such race can occur > afaict, as both callers of _clear_irq_vector() acquire the IRQ > descriptor lock first, and irq_complete_move() (the sole caller of > send_cleanup_vector()) is only ever invoked as or by an ->ack() > hook, which in turn is only invoked with, again, the descriptor lock > held. Yes I see that you are right about the locking, and thank you for taking the time to look into it. One last question: could it be that a second interrupt arrives while ->ack() is being handled? do_IRQ() is running with interrupts disabled? > > It is possible that send_cleanup_vector and _clear_irq_vector are > > running at the same time on different CPUs. In that case we have a race > > as both _clear_irq_vector and irq_move_cleanup_interrupt are trying to > > clear old_vector. > > > > This patch fixes 3 races: > > > > 1) As irq_move_cleanup_interrupt is running on multiple CPUs at the > > same time, and also _clear_irq_vector is running, it is possible that > > only some per_cpu(vector_irq, cpu)[old_vector] are valid but not all. > > So, turn the ASSERT in _clear_irq_vector into an if. > > Note again the locking which is in effect. > > > 2) It is possible that _clear_irq_vector is running at the same time as > > release_old_vec, called from irq_move_cleanup_interrupt. At the moment, > > it is possible for _clear_irq_vector to read a valid old_cpu_mask but an > > invalid old_vector (because it is being set to invalid by > > release_old_vec). To avoid this problem in release_old_vec move clearing > > old_cpu_mask before setting old_vector to invalid. This way, we know that > > in _clear_irq_vector if old_vector is invalid also old_cpu_mask is zero > > and we don't enter the loop. > > All invocations of release_old_vec() are similarly inside suitably > locked regions. > > > 3) It is possible that release_old_vec is running twice at the same time > > for the same old_vector. Change the code in release_old_vec to make it > > OK to call it twice. Remove both ASSERTs. With those gone, it should be > > possible now to call release_old_vec twice in a row for the same > > old_vector. > > Same here. > > Any such issues would surface more frequently and without any suspend / > resume involved. What is still missing is that connection, and only then > it'll (or really: may) become clear what needs adjusting. If you've seen > the issue exactly once, then I'm afraid there's not much we can do unless > someone can come up with a plausible explanation of something being > broken on any of the involved code paths. More information will need to > be gathered out of the next occurrence of this, whenever that's going to > be. One of the things we will want to know, as mentioned before, is the > value that per_cpu(vector_irq, cpu)[old_vector] has when the assertion > triggers. Iirc Roger did suggest another piece of data you'd want to log. Understood, thanks for the explanation. > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -112,16 +112,11 @@ static void release_old_vec(struct irq_desc *desc) > > { > > unsigned int vector = desc->arch.old_vector; > > > > - desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED; > > cpumask_clear(desc->arch.old_cpu_mask); > > + desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED; > > > > - if ( !valid_irq_vector(vector) ) > > - ASSERT_UNREACHABLE(); > > - else if ( desc->arch.used_vectors ) > > - { > > - ASSERT(test_bit(vector, desc->arch.used_vectors)); > > + if ( desc->arch.used_vectors ) > > clear_bit(vector, desc->arch.used_vectors); > > - } > > } > > > > static void _trace_irq_mask(uint32_t event, int irq, int vector, > > @@ -230,9 +225,11 @@ static void _clear_irq_vector(struct irq_desc *desc) > > > > for_each_cpu(cpu, tmp_mask) > > { > > - ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); > > - TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); > > - per_cpu(vector_irq, cpu)[old_vector] = ~irq; > > + if ( per_cpu(vector_irq, cpu)[old_vector] == irq ) > > + { > > + TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); > > + per_cpu(vector_irq, cpu)[old_vector] = ~irq; > > + } > > } > > As said before - replacing ASSERT() by a respective if() cannot really > be done without discussing the "else" in the description. Except of > course in trivial/obvious cases, but I think we agree here we don't > have such a case.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.