[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] x86 guest EOI timer issues / questions



All,

Andrew's observation mentioned in "[PATCH v2] x86/smt: Support for
enabling/disabling SMT at runtime" and some discussions we've already
had with him made me look into how exactly the EOI timer works. I
think there are a number of quirks. I'll enumerate all questions I've run
into below. At the end of the mail is a patch carrying out some of the
adjustments implied by a subset of these questions, plus the addition
of some debugging code. It is perhaps worthwhile to note that the
printk()-s irq_guest_eoi_timer_fn() both were observed to trigger
prior to making the non-debugging code changes. With the patch in
its current shape, I've seen trigger (every once in a while) only the
printk() added to __do_IRQ_guest().

- Why does the timer not get stopped in desc_guest_eoi() when
  ->in_flight is zero?

- Why does irq_guest_eoi_timer_fn() not re-arm the timer when
  ->in_flight is still non-zero after all the decrements? Can this case
  happen at all, considering that the only increment in
  __do_IRQ_guest() happens under the same desc lock? (If it can
  happen, I think it would be against the purpose of 37eb6d05fe,
  as this would mean the IRQ can then remain un-acked for an
  indefinite period of time, unless something else re-armed the
  timer.)

- Why does irq_guest_eoi_timer_fn() issue ->end() / set_eoi_ready()
  even when no decrement of any ->in_flight was done at all? Neither
  for ACKTYPE_UNMASK nor for ACKTYPE_EOI this looks to be fully
  race free.

- Wouldn't __do_IRQ_guest() better stop the timer early on?

- Wouldn't __do_IRQ_guest() better avoid re-programming the timer
  if no ->in_flight increment was done (which I would have thought
  shouldn't happen anyway, but which I've observed in practice)? Of
  course this would mean the timer may only be stopped when the
 first increment occurs.

- What about the timer triggering while __do_IRQ_guest() is active
  (holding the desc lock)? This looks to result in immediate expiry of
  the EOI deferral for the current interrupt instance (and I've
  observed this in practice, luckily with no other visible bad effects).

- Is it okay for fixup_eoi() to fiddle with action->cpu_eoi_map
  without holding the desc lock? (I guess being in stop-machine /
  SMP-boot context makes this acceptable.)

Commits of interest:

f3922f4084 x86: Support CPU hotplug offline
37eb6d05fe x86: Automatically EOI guest-bound interrupts if guest takes too long

Other notes / observations:
- irq_guest_eoi_timer_fn() should not need to re-acquire the lock
  after on_selected_cpus()
- irq_guest_eoi_timer_fn() should not need to use irqsave/irqrestore
  spin lock/unlock (plain irq ones should suffice in a timer handler)
- irq_guest_eoi_timer_fn() could ASSERT(action->ack_type !=
  ACKTYPE_NONE) instead of having such a conditional
- fixup_eoi() may better avoid setting all peoi[].ready and instead
  pass a "force" boolean to flush_ready_eoi()

Thoughts / opinions appreciated,
Jan

--- unstable.orig/xen/arch/x86/irq.c
+++ unstable/xen/arch/x86/irq.c
@@ -1103,7 +1103,7 @@ static void set_eoi_ready(void *data);
 static void irq_guest_eoi_timer_fn(void *data)
 {
     struct irq_desc *desc = data;
-    unsigned int irq = desc - irq_desc;
+    unsigned int irq = desc - irq_desc, done = 0;
     irq_guest_action_t *action;
     cpumask_t cpu_eoi_map;
     unsigned long flags;
@@ -1115,6 +1115,16 @@ static void irq_guest_eoi_timer_fn(void
 
     action = (irq_guest_action_t *)desc->action;
 
+    if ( action->eoi_timer.status >= TIMER_STATUS_in_heap )
+{//temp
+ static unsigned long cnt, thr;
+ if(++cnt > thr) {
+  thr |= cnt;
+  printk("IRQ%u: i=%u a=%d n=%u\n", irq, action->in_flight, action->ack_type, 
action->nr_guests);
+ }
+        goto out;
+}
+
     if ( action->ack_type != ACKTYPE_NONE )
     {
         unsigned int i;
@@ -1122,11 +1132,29 @@ static void irq_guest_eoi_timer_fn(void
         {
             struct domain *d = action->guest[i];
             unsigned int pirq = domain_irq_to_pirq(d, irq);
+
             if ( test_and_clear_bool(pirq_info(d, pirq)->masked) )
-                action->in_flight--;
+                ++done;
         }
+        action->in_flight -= done;
     }
 
+{//temp
+ static unsigned long done_cnt, done_thr;
+ static unsigned long infl_cnt, infl_thr;
+ bool log = false;
+ if(action->in_flight && ++infl_cnt > infl_thr) {
+  infl_thr |= infl_cnt;
+  log = true;
+ } else if(!done && ++done_cnt > done_thr) {
+  done_thr |= done_cnt;
+  log = true;
+ }
+ if(log)
+  printk("IRQ%u: d=%u i=%u a=%d n=%u (%06lx,%06lx)\n",  irq, done, 
action->in_flight,
+         action->ack_type, action->nr_guests, done_cnt, infl_cnt);
+}
+//todo Instead of the below, assert that ->in_flight is zero and bail if done 
is zero?
     if ( action->in_flight != 0 )
         goto out;
 
@@ -1156,6 +1184,7 @@ static void __do_IRQ_guest(int irq)
     int                 i, sp;
     struct pending_eoi *peoi = this_cpu(pending_eoi);
     unsigned int        vector = (u8)get_irq_regs()->entry_vector;
+unsigned done = 0;//temp
 
     if ( unlikely(action->nr_guests == 0) )
     {
@@ -1167,6 +1196,9 @@ static void __do_IRQ_guest(int irq)
         return;
     }
 
+    if ( action->ack_type != ACKTYPE_NONE )
+        stop_timer(&action->eoi_timer);
+
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
@@ -1187,6 +1219,7 @@ static void __do_IRQ_guest(int irq)
         pirq = pirq_info(d, domain_irq_to_pirq(d, irq));
         if ( (action->ack_type != ACKTYPE_NONE) &&
              !test_and_set_bool(pirq->masked) )
+++done,//temp
             action->in_flight++;
         if ( !is_hvm_domain(d) || !hvm_do_IRQ_dpci(d, pirq) )
             send_guest_pirq(d, pirq);
@@ -1194,10 +1227,16 @@ static void __do_IRQ_guest(int irq)
 
     if ( action->ack_type != ACKTYPE_NONE )
     {
-        stop_timer(&action->eoi_timer);
         migrate_timer(&action->eoi_timer, smp_processor_id());
         set_timer(&action->eoi_timer, NOW() + MILLISECS(1));
     }
+if(!done) {//temp
+ static unsigned long cnt, thr;
+ if(++cnt > thr) {
+  thr |= cnt;
+  printk("IRQ%d: d=0 n=%u i=%u a=%d\n", irq, action->nr_guests, 
action->in_flight, action->ack_type);
+ }
+}
 }
 
 /*
@@ -1457,6 +1496,8 @@ void desc_guest_eoi(struct irq_desc *des
         return;
     }
 
+    stop_timer(&action->eoi_timer);
+
     if ( action->ack_type == ACKTYPE_UNMASK )
     {
         ASSERT(cpumask_empty(action->cpu_eoi_map));


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.