[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix



On Thu, 25 May 2023, Jan Beulich wrote:
> On 25.05.2023 01:51, Stefano Stabellini wrote:
> > xen/irq: fix races between send_cleanup_vector and _clear_irq_vector
> 
> This title is, I'm afraid, already misleading. No such race can occur
> afaict, as both callers of _clear_irq_vector() acquire the IRQ
> descriptor lock first, and irq_complete_move() (the sole caller of
> send_cleanup_vector()) is only ever invoked as or by an ->ack()
> hook, which in turn is only invoked with, again, the descriptor lock
> held.

Yes I see that you are right about the locking, and thank you for taking
the time to look into it.

One last question: could it be that a second interrupt arrives while
->ack() is being handled?  do_IRQ() is running with interrupts disabled?


> > It is possible that send_cleanup_vector and _clear_irq_vector are
> > running at the same time on different CPUs. In that case we have a race
> > as both _clear_irq_vector and irq_move_cleanup_interrupt are trying to
> > clear old_vector.
> > 
> > This patch fixes 3 races:
> > 
> > 1) As irq_move_cleanup_interrupt is running on multiple CPUs at the
> > same time, and also _clear_irq_vector is running, it is possible that
> > only some per_cpu(vector_irq, cpu)[old_vector] are valid but not all.
> > So, turn the ASSERT in _clear_irq_vector into an if.
> 
> Note again the locking which is in effect.
> 
> > 2) It is possible that _clear_irq_vector is running at the same time as
> > release_old_vec, called from irq_move_cleanup_interrupt. At the moment,
> > it is possible for _clear_irq_vector to read a valid old_cpu_mask but an
> > invalid old_vector (because it is being set to invalid by
> > release_old_vec). To avoid this problem in release_old_vec move clearing
> > old_cpu_mask before setting old_vector to invalid. This way, we know that
> > in _clear_irq_vector if old_vector is invalid also old_cpu_mask is zero
> > and we don't enter the loop.
> 
> All invocations of release_old_vec() are similarly inside suitably
> locked regions.
> 
> > 3) It is possible that release_old_vec is running twice at the same time
> > for the same old_vector. Change the code in release_old_vec to make it
> > OK to call it twice. Remove both ASSERTs. With those gone, it should be
> > possible now to call release_old_vec twice in a row for the same
> > old_vector.
> 
> Same here.
> 
> Any such issues would surface more frequently and without any suspend /
> resume involved. What is still missing is that connection, and only then
> it'll (or really: may) become clear what needs adjusting. If you've seen
> the issue exactly once, then I'm afraid there's not much we can do unless
> someone can come up with a plausible explanation of something being
> broken on any of the involved code paths. More information will need to
> be gathered out of the next occurrence of this, whenever that's going to
> be. One of the things we will want to know, as mentioned before, is the
> value that per_cpu(vector_irq, cpu)[old_vector] has when the assertion
> triggers. Iirc Roger did suggest another piece of data you'd want to log.

Understood, thanks for the explanation.


> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -112,16 +112,11 @@ static void release_old_vec(struct irq_desc *desc)
> >  {
> >      unsigned int vector = desc->arch.old_vector;
> >  
> > -    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> >      cpumask_clear(desc->arch.old_cpu_mask);
> > +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> >  
> > -    if ( !valid_irq_vector(vector) )
> > -        ASSERT_UNREACHABLE();
> > -    else if ( desc->arch.used_vectors )
> > -    {
> > -        ASSERT(test_bit(vector, desc->arch.used_vectors));
> > +    if ( desc->arch.used_vectors )
> >          clear_bit(vector, desc->arch.used_vectors);
> > -    }
> >  }
> >  
> >  static void _trace_irq_mask(uint32_t event, int irq, int vector,
> > @@ -230,9 +225,11 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >  
> >          for_each_cpu(cpu, tmp_mask)
> >          {
> > -            ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
> > -            TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
> > -            per_cpu(vector_irq, cpu)[old_vector] = ~irq;
> > +            if ( per_cpu(vector_irq, cpu)[old_vector] == irq )
> > +            {
> > +                TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
> > +                per_cpu(vector_irq, cpu)[old_vector] = ~irq;
> > +            }
> >          }
> 
> As said before - replacing ASSERT() by a respective if() cannot really
> be done without discussing the "else" in the description. Except of
> course in trivial/obvious cases, but I think we agree here we don't
> have such a case.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.