[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels



On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
> >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> > On 08/25/2010 11:46 PM, Jan Beulich wrote:
> >>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
> >>> Note that this patch is specifically for upstream Xen, which doesn't
> >>> have any pirq support in it at present.
> >> I understand that, but saw that you had paralleling changes to the
> >> pirq handling in your Dom0 tree.
> >>
> >>> However,  I did consider using fasteoi, but I couldn't see how to make
> >>> it work.  The problem is that it only does a single call into the
> >>> irq_chip for EOI after calling the interrupt handler, but there is no
> >>> call beforehand to ack the interrupt (which means clear the event flag
> >>> in our case).  This leads to a race where an event can be lost after the
> >>> interrupt handler has returned, but before the event flag has been
> >>> cleared (because Xen won't set pending or call the upcall function if
> >>> the event is already set).  I guess I could pre-clear the event in the
> >>> upcall function, but I'm not sure that's any better.
> >> That's precisely what we're doing.
> > 
> > You mean pre-clearing the event?  OK.
> > 
> > But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> > 
> > With handle_fasteoi_irq:
> > 
> > cpu A                       cpu B
> > get event
> 
> mask and clear event

Argh. Right, I guess that's my fault, I was the one who came up with the
PENDING theory, but indeed I failed to see the event masking bits.

However, please read on.

> > set INPROGRESS
> > call action
> >    :
> >    :
> > <migrate event channel to B>
> >    :                        get event
> 
> Cannot happen, event is masked (i.e. all that would happen is
> that the event occurrence would be logged evtchn_pending).
> 
> >    :                        INPROGRESS set? -> EOI, return
> >    :
> > action returns
> > clear INPROGRESS
> > EOI
> 
> unmask event, checking for whether the event got re-bound (and
> doing the unmask through a hypercall if necessary), thus re-raising
> the event in any case

Yes. I agree. So let's come up with a new theory. Right now I'm still
looking at xen/next. Correct me if I'm mistaken:

mask_ack_pirq will:
 1. chip->mask
 2. chip->ack

Where chip->ack will:
 1. move_native_irq
 2. clear_evtchn.

Now if you look into move_native_irq, it will:
 1. chip->mask (gratuitous)
 2. move
 3. chip->unmask (aiiiiiie).

That explains why edge_irq still fixed the problem.

Price question is if that's the kind of fix we wanted then.

Cheers,
Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.