[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domU and dom0 hung with Xen console interrupt binding showing in-flight=1, (---M)


  • To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • From: Bruce Edge <bruce.edge@xxxxxxxxx>
  • Date: Thu, 19 Aug 2010 06:42:36 -0700
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
  • Delivery-date: Thu, 19 Aug 2010 06:43:16 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=G1o8bQB+2XnHA84SC7yiMUtiNhYa9QfFkLd3hHbpbrdIAPgUXSJB/3jr2M9RW0vi2z sW3hxMwquN87M1MnvYBhFCCKEJM5ZPCl1RbqDVb7s6iE6uV5qUMWH3pnSR2OKGLsZL16 FXZhmqVstYLAsr9+6qBWsB6jawpw5sdyteuJ4=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>


-Bruce


On Wed, Aug 18, 2010 at 2:40 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
On 18/08/2010 09:47, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:

> Yes, that was what I was trying to hint at, but I wasn't sure whether
> calling ->end() here has any unintended side effects and/or requires
> any extra care (like preventing a subsequent guest initiated EOI to
> call ->end() again).

Oh you can't naively call ->end() from the time-out handler. You would need
to do something like this in irq_guest_eoi_timer_fn:
 spin_lock(&desc->lock);
 if ( (desc->status & IRQ_GUEST) &&
     (action->ack_type == ACKTYPE_EOI) ) {
   cpu_eoi_map = action->cpu_eoi_map;
   spin_unlock(&desc->lock);
   on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0);
   spin_lock(&desc->lock);
 }
 _irq_guest_eoi(desc);
 spin_unlock(&desc->lock);

I don't think the IRQ_GUEST_EOI_PENDING flag or any of that stuff is needed
for the ACKTYPE_EOI case. I'd make the handling of that, calling of
->disable/->enable and so on, dependent on ACKTYPE_NONE.

> While looking at this I came across another thing I don't understand:
> __pirq_guest_eoi(), for the ACKTYPE_EOI case, calls __set_eoi_ready()
> in a cpu_test_and_clear() conditional, but __set_eoi_ready() bails
> out if it finds !cpu_test_and_clear() on the same bitmap - what's the
> point of calling __set_eoi_ready() here then (or what am I missing)?

__pirq_guest_eoi() acts on a private on-stack copy of cpu_eoi_map. This is
because on_selected_cpus() cannot be called with desc->lock held. But as
soon as desc->lock is released, the desc->action structure can be freed by
another CPU, so it would be invalid to reference action->cpu_eoi_map
directly after desc->lock is released.

 -- Keir


Is there any more information that I can provide that would be helpful in diagnosing the direct cause and the appropriate fix?
Possibly adding instrumentation or trace code to detect the trigger conditions?
This is very repeatable on our target systems after a few hours of load.

Thanks

-Bruce

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.