[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] PV-on-HVM platform interrupt not properly cleared

If you use PV-on-HVM drivers with a guest with multiple vcpus, and the
guest uses the APIC to route interrupts to other than vcpu 0, the
interrupt asserted count may be cleared tardily after the interrupt is

The problem occurs as follows:
  * The PV guest configures the emulated IO-APIC to deliver the Xen
    platform interrupt to a CPU other than 0.  (PV-on-HVM drivers do
    not use EVTCHNOP_bind_vcpu to bind the event channel to a
    different vcpu; the event channel vcpu is left at 0.)
  * An incoming event sets evtch_upcall_pending.  At next VM entry
    hvm_set_callback_irq_level sees this, and calls
    __hvm_pci_intx_assert, which increments the relevant assert_count
    and calls vioapic_deliver.  This makes notes in IRR and TMR so
    that VM entry will inject the actual interrupt.
  * The guest's ISR runs, clears evtch_upcall_pending, and eventually
    finishes.  It writes EOI, resulting in VM exit and a call to
    vlapic_EOI_set.  That clears the ISR and TMR flags, and then calls
  * vioapic_update_EOI spots that the interrupt is still asserted in
    hvm_irq->gsi_assert_count and reasserts IRR.
  * We once more prepare for VM entry.  If the vcpu is #0 then we
    rerun hvm_set_callback_irq_level, which sees that upcall_pending
    has been cleared and deasserts gsi_assert_count.  However this is
    too late to prevent interrupt injection (IRR is still set) and we
    spuriously but essentially harmlessly enter the ISR in the guest
    despite upcall_pending being clear.  However, if the vcpu to be
    run next is not #0 then the pre-locking test in
    hvm_set_callback_irq_level triggers, and we don't then recheck

The upshot is that interrupt will not be cleared until vcpu #0 is
scheduled for some reason.  In my tests flood-pinging an
otherwise-idle Linux 2.6.18 guest, after a second or two the guest
reassigns the IRQ to a different vcpu, eventually returning to vcpu #0
and then the stuck interrupt gets cleared.

The preconditions are:
  * Guest is using PV-on-HVM drivers
  * Guest uses the (emulated) io-apic to route the Xen platform
    interrupt to non-0 cpu(s) (ie, non-0 vcpus).

The symptoms are likely to be much more severe in setups where the
guest has more vcpus than there are physical cpus, or where there is a
shortage of cpu time.  In that case there can obviously be a longer
wait for vcpu 0 to run.

I was reluctant to remove the v->vcpu_id != 0 test as that would
result in (in most setups) every VM entry taking out
hvm_domain.irq_lock.  If that's acceptable then removing that test
would be a much simpler change than my patch below.

Instead in the attached patch I split hvm_set_callback_irq_level up
slightly.  This allows me to create an entrypoint which only ever
deasserts interrupts rather than asserting them.  This is (I think)
suitable for calling from vlapic_EOI_set.

There is still a slight possible problem I think in some setups: even
with the attached change an interrupt bound in the (virtual) io-apic
to a non-zero vcpu will still not be recognised until the next VM
entry on vcpu 0, and will after than then still not run until the vcpu
chosen by the vioapic despatch logic runs.

I've done a simple ad-hoc test of this patch and it makes the symptoms
go away in my configuration.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>


diff -r 2491691e3e69 xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c    Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/arch/x86/hvm/irq.c    Mon Jan 07 18:02:51 2008 +0000
@@ -125,24 +125,46 @@ void hvm_isa_irq_deassert(
+static int lock_get_upcall_asserted(struct domain *d)
+    /* HVM guests' PV drivers leave the platform interrupt bound to
+     * CPU 0 in Xen, and use the APIC to manage its delivery. */
+    struct vcpu *v0 = d->vcpu[0];
+    spin_lock(&d->arch.hvm_domain.irq_lock);
+    /* NB. Do not check the evtchn_upcall_mask. It is not used in HVM mode. */
+    return !!vcpu_info(v0, evtchn_upcall_pending);
+static void __hvm_set_callback_irq_level
+    (struct domain *d, struct hvm_irq *hvm_irq, int asserted);
 void hvm_set_callback_irq_level(void)
     struct vcpu *v = current;
     struct domain *d = v->domain;
     struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
-    unsigned int gsi, pdev, pintx, asserted;
+    unsigned int asserted;
     /* Fast lock-free tests. */
     if ( (v->vcpu_id != 0) ||
          (hvm_irq->callback_via_type == HVMIRQ_callback_none) )
-    spin_lock(&d->arch.hvm_domain.irq_lock);
-    /* NB. Do not check the evtchn_upcall_mask. It is not used in HVM mode. */
-    asserted = !!vcpu_info(v, evtchn_upcall_pending);
-    if ( hvm_irq->callback_via_asserted == asserted )
-        goto out;
+    asserted = lock_get_upcall_asserted(d);
+    if ( hvm_irq->callback_via_asserted != asserted )
+       __hvm_set_callback_irq_level(d, hvm_irq, asserted);
+    spin_unlock(&d->arch.hvm_domain.irq_lock);    
+static void __hvm_set_callback_irq_level
+    (struct domain *d, struct hvm_irq *hvm_irq, int asserted)
+    unsigned int gsi, pdev, pintx;
     hvm_irq->callback_via_asserted = asserted;
     /* Callback status has changed. Update the callback via. */
@@ -172,8 +194,19 @@ void hvm_set_callback_irq_level(void)
- out:
+void hvm_perhaps_callback_irq_deasserted(void)
+    struct domain *d = current->domain;
+    struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
+    unsigned int asserted;
+    asserted = lock_get_upcall_asserted(d);
+    if ( hvm_irq->callback_via_asserted > asserted )
+       __hvm_set_callback_irq_level(d, hvm_irq, asserted);
diff -r 2491691e3e69 xen/arch/x86/hvm/vlapic.c
--- a/xen/arch/x86/hvm/vlapic.c Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/arch/x86/hvm/vlapic.c Mon Jan 07 17:56:44 2008 +0000
@@ -355,7 +355,14 @@ struct vlapic *apic_round_robin(
 void vlapic_EOI_set(struct vlapic *vlapic)
-    int vector = vlapic_find_highest_isr(vlapic);
+    int vector;
+    hvm_perhaps_callback_irq_deasserted();
+    /* if interrupts are being serviced other than on vcpu 0, we want
+     * to deassert the GSI or PCI interrupts if the guest has cleared
+     * evtchn_upcall_pending for CPU 0. */
+    vector = vlapic_find_highest_isr(vlapic);
     /* Some EOI writes may not have a matching to an in-service interrupt. */
     if ( vector == -1 )
diff -r 2491691e3e69 xen/include/asm-x86/hvm/irq.h
--- a/xen/include/asm-x86/hvm/irq.h     Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/include/asm-x86/hvm/irq.h     Mon Jan 07 17:28:50 2008 +0000
@@ -154,6 +154,7 @@ void hvm_set_pci_link_route(struct domai
 void hvm_set_callback_irq_level(void);
 void hvm_set_callback_via(struct domain *d, uint64_t via);
+void hvm_perhaps_callback_irq_deasserted(void);
 /* Check/Acknowledge next pending interrupt. */
 struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v);
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.