Xen project Mailing List

[Xen-devel] PV-on-HVM platform interrupt not properly cleared

From: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Date: Tue, 8 Jan 2008 10:38:29 +0000

Delivery-date: Tue, 08 Jan 2008 02:40:40 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

If you use PV-on-HVM drivers with a guest with multiple vcpus, and the guest uses the APIC to route interrupts to other than vcpu 0, the interrupt asserted count may be cleared tardily after the interrupt is serviced. The problem occurs as follows: * The PV guest configures the emulated IO-APIC to deliver the Xen platform interrupt to a CPU other than 0. (PV-on-HVM drivers do not use EVTCHNOP_bind_vcpu to bind the event channel to a different vcpu; the event channel vcpu is left at 0.) * An incoming event sets evtch_upcall_pending. At next VM entry hvm_set_callback_irq_level sees this, and calls __hvm_pci_intx_assert, which increments the relevant assert_count and calls vioapic_deliver. This makes notes in IRR and TMR so that VM entry will inject the actual interrupt. * The guest's ISR runs, clears evtch_upcall_pending, and eventually finishes. It writes EOI, resulting in VM exit and a call to vlapic_EOI_set. That clears the ISR and TMR flags, and then calls vioapic_update_EOI. * vioapic_update_EOI spots that the interrupt is still asserted in hvm_irq->gsi_assert_count and reasserts IRR. * We once more prepare for VM entry. If the vcpu is #0 then we rerun hvm_set_callback_irq_level, which sees that upcall_pending has been cleared and deasserts gsi_assert_count. However this is too late to prevent interrupt injection (IRR is still set) and we spuriously but essentially harmlessly enter the ISR in the guest despite upcall_pending being clear. However, if the vcpu to be run next is not #0 then the pre-locking test in hvm_set_callback_irq_level triggers, and we don't then recheck upcall_pending. The upshot is that interrupt will not be cleared until vcpu #0 is scheduled for some reason. In my tests flood-pinging an otherwise-idle Linux 2.6.18 guest, after a second or two the guest reassigns the IRQ to a different vcpu, eventually returning to vcpu #0 and then the stuck interrupt gets cleared. The preconditions are: * Guest is using PV-on-HVM drivers * Guest uses the (emulated) io-apic to route the Xen platform interrupt to non-0 cpu(s) (ie, non-0 vcpus). The symptoms are likely to be much more severe in setups where the guest has more vcpus than there are physical cpus, or where there is a shortage of cpu time. In that case there can obviously be a longer wait for vcpu 0 to run. I was reluctant to remove the v->vcpu_id != 0 test as that would result in (in most setups) every VM entry taking out hvm_domain.irq_lock. If that's acceptable then removing that test would be a much simpler change than my patch below. Instead in the attached patch I split hvm_set_callback_irq_level up slightly. This allows me to create an entrypoint which only ever deasserts interrupts rather than asserting them. This is (I think) suitable for calling from vlapic_EOI_set. There is still a slight possible problem I think in some setups: even with the attached change an interrupt bound in the (virtual) io-apic to a non-zero vcpu will still not be recognised until the next VM entry on vcpu 0, and will after than then still not run until the vcpu chosen by the vioapic despatch logic runs. I've done a simple ad-hoc test of this patch and it makes the symptoms go away in my configuration. Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> Ian.

diff -r 2491691e3e69 xen/arch/x86/hvm/irq.c --- a/xen/arch/x86/hvm/irq.c Sat Dec 29 17:57:47 2007 +0000 +++ b/xen/arch/x86/hvm/irq.c Mon Jan 07 18:02:51 2008 +0000 @@ -125,24 +125,46 @@ void hvm_isa_irq_deassert( spin_unlock(&d->arch.hvm_domain.irq_lock); } +static int lock_get_upcall_asserted(struct domain *d) +{ + /* HVM guests' PV drivers leave the platform interrupt bound to + * CPU 0 in Xen, and use the APIC to manage its delivery. */ + struct vcpu *v0 = d->vcpu[0]; + + spin_lock(&d->arch.hvm_domain.irq_lock); + + /* NB. Do not check the evtchn_upcall_mask. It is not used in HVM mode. */ + return !!vcpu_info(v0, evtchn_upcall_pending); +} + +static void __hvm_set_callback_irq_level + (struct domain *d, struct hvm_irq *hvm_irq, int asserted); + void hvm_set_callback_irq_level(void) { struct vcpu *v = current; struct domain *d = v->domain; struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; - unsigned int gsi, pdev, pintx, asserted; + unsigned int asserted; /* Fast lock-free tests. */ if ( (v->vcpu_id != 0) || (hvm_irq->callback_via_type == HVMIRQ_callback_none) ) return; - spin_lock(&d->arch.hvm_domain.irq_lock); - - /* NB. Do not check the evtchn_upcall_mask. It is not used in HVM mode. */ - asserted = !!vcpu_info(v, evtchn_upcall_pending); - if ( hvm_irq->callback_via_asserted == asserted ) - goto out; + asserted = lock_get_upcall_asserted(d); + + if ( hvm_irq->callback_via_asserted != asserted ) + __hvm_set_callback_irq_level(d, hvm_irq, asserted); + + spin_unlock(&d->arch.hvm_domain.irq_lock); +} + +static void __hvm_set_callback_irq_level + (struct domain *d, struct hvm_irq *hvm_irq, int asserted) +{ + unsigned int gsi, pdev, pintx; + hvm_irq->callback_via_asserted = asserted; /* Callback status has changed. Update the callback via. */ @@ -172,8 +194,19 @@ void hvm_set_callback_irq_level(void) default: break; } - - out: +} + +void hvm_perhaps_callback_irq_deasserted(void) +{ + struct domain *d = current->domain; + struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; + unsigned int asserted; + + asserted = lock_get_upcall_asserted(d); + + if ( hvm_irq->callback_via_asserted > asserted ) + __hvm_set_callback_irq_level(d, hvm_irq, asserted); + spin_unlock(&d->arch.hvm_domain.irq_lock); } diff -r 2491691e3e69 xen/arch/x86/hvm/vlapic.c --- a/xen/arch/x86/hvm/vlapic.c Sat Dec 29 17:57:47 2007 +0000 +++ b/xen/arch/x86/hvm/vlapic.c Mon Jan 07 17:56:44 2008 +0000 @@ -355,7 +355,14 @@ struct vlapic *apic_round_robin( void vlapic_EOI_set(struct vlapic *vlapic) { - int vector = vlapic_find_highest_isr(vlapic); + int vector; + + hvm_perhaps_callback_irq_deasserted(); + /* if interrupts are being serviced other than on vcpu 0, we want + * to deassert the GSI or PCI interrupts if the guest has cleared + * evtchn_upcall_pending for CPU 0. */ + + vector = vlapic_find_highest_isr(vlapic); /* Some EOI writes may not have a matching to an in-service interrupt. */ if ( vector == -1 ) diff -r 2491691e3e69 xen/include/asm-x86/hvm/irq.h --- a/xen/include/asm-x86/hvm/irq.h Sat Dec 29 17:57:47 2007 +0000 +++ b/xen/include/asm-x86/hvm/irq.h Mon Jan 07 17:28:50 2008 +0000 @@ -154,6 +154,7 @@ void hvm_set_pci_link_route(struct domai void hvm_set_callback_irq_level(void); void hvm_set_callback_via(struct domain *d, uint64_t via); +void hvm_perhaps_callback_irq_deasserted(void); /* Check/Acknowledge next pending interrupt. */ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v);

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.