Xen project Mailing List

[Xen-devel] [PATCH v3] x86/hvm: re-work viridian APIC assist code

From: Paul Durrant <paul.durrant@xxxxxxxxxx>

Date: Thu, 18 Jan 2018 11:33:35 -0500

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>

Delivery-date: Thu, 18 Jan 2018 16:34:48 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

It appears there is a case where Windows enables the APIC assist enlightenment[1] but does not use it. This scenario is perfectly valid according to the documentation, but causes the state machine in Xen to become confused leading to a domain_crash() such as the following: (XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0 build: 1db0 (XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff (XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe (XEN) domain_crash called from viridian.c:452 (XEN) Domain 4 (vcpu#0) crashed on cpu#1: The following sequence of events is an example of how this can happen: - On return to guest vlapic_has_pending_irq() finds a bit set in the IRR. - vlapic_ack_pending_irq() calls viridian_start_apic_assist() which latches the vector, sets the bit in the ISR and clears it from the IRR. - The guest then processes the interrupt but EOIs it normally, therefore clearing the bit in the ISR. - On next return to guest vlapic_has_pending_irq() calls viridian_complete_apic_assist(), which discovers the assist bit still set in the shared page and therefore leaves the latched vector in place, but also finds another bit set in the IRR. - vlapic_ack_pending_irq() is then called but, because the ISR is was cleared by the EOI, another call is made to viridian_start_apic_assist() and this then calls domain_crash() because it finds the latched vector has not been cleared. Having re-visited the code I also conclude that Xen's implementation of the enlightenment is currently wrong and we are not properly following the specification. The specification says: "The hypervisor sets the “No EOI required” bit when it injects a virtual interrupt if the following conditions are satisfied: - The virtual interrupt is edge-triggered, and - There are no lower priority interrupts pending. If, at a later time, a lower priority interrupt is requested, the hypervisor clears the “No EOI required” such that a subsequent EOI causes an intercept. In case of nested interrupts, the EOI intercept is avoided only for the highest priority interrupt. This is necessary since no count is maintained for the number of EOIs performed by the OS. Therefore only the first EOI can be avoided and since the first EOI clears the “No EOI Required” bit, the next EOI generates an intercept." Thus it is quite legitimate to set the "No EOI required" bit and then subsequently take a higher priority interrupt without clearing the bit. Thus the avoided EOI will then relate to that subsequent interrupt rather than the highest priority interrupt when the bit was set. Hence latching the vector when setting the bit is not entirely useful and somewhat misleading. This patch re-works the APIC assist code to simply track when the "No EOI required" bit is set and test if it has been cleared by the guest (i.e. 'completing' the APIC assist), thus indicating a 'missed EOI'. Missed EOIs need to be dealt with in two places: - In vlapic_has_pending_irq(), to avoid comparing the IRR against a stale ISR, and - In vlapic_EOI_set() because a missed EOI for a higher priority vector should be dealt with before the actual EOI for the lower priority vector. Furthermore, because the guest is at liberty to ignore the "No EOI required" bit (which lead the crash detailed above) vlapic_EOI_set() must also make sure the bit is cleared to avoid confusing the state machine. Lastly the previous code did not properly emulate an EOI if a missed EOI was discovered in vlapic_has_pending_irq(); it merely cleared the bit in the ISR. The new code instead calls vlapic_EOI_set(). [1] See section 10.3.5 of Microsoft's "Hypervisor Top Level Functional Specification v5.0b". NOTE: The changes to the save/restore code are safe because the layout of struct hvm_viridian_vcpu_context is unchanged and the new interpretation of the (previously so named) vp_assist_vector field as the boolean pending flag maintains the correct semantics. Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> --- Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> v3: - Cosmetic fixes requested by Jan v2: - Extend patch to completely re-work APIC assist code --- xen/arch/x86/hvm/viridian.c | 36 ++++++++-------- xen/arch/x86/hvm/vlapic.c | 75 ++++++++++++++++++++++++---------- xen/include/asm-x86/hvm/viridian.h | 8 ++-- xen/include/public/arch-x86/hvm/save.h | 2 +- 4 files changed, 76 insertions(+), 45 deletions(-) diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c index f0fa59d7d5..70aab520bc 100644 --- a/xen/arch/x86/hvm/viridian.c +++ b/xen/arch/x86/hvm/viridian.c @@ -433,46 +433,44 @@ static void teardown_vp_assist(struct vcpu *v) put_page_and_type(page); } -void viridian_start_apic_assist(struct vcpu *v, int vector) +void viridian_apic_assist_set(struct vcpu *v) { uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va; if ( !va ) return; - if ( vector < 0x10 ) - return; - /* * If there is already an assist pending then something has gone * wrong and the VM will most likely hang so force a crash now * to make the problem clear. */ - if ( v->arch.hvm_vcpu.viridian.vp_assist.vector ) + if ( v->arch.hvm_vcpu.viridian.vp_assist.pending ) domain_crash(v->domain); - v->arch.hvm_vcpu.viridian.vp_assist.vector = vector; + v->arch.hvm_vcpu.viridian.vp_assist.pending = true; *va |= 1u; } -int viridian_complete_apic_assist(struct vcpu *v) +bool viridian_apic_assist_completed(struct vcpu *v) { uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va; - int vector; if ( !va ) - return 0; + return false; - if ( *va & 1u ) - return 0; /* Interrupt not yet processed by the guest. */ - - vector = v->arch.hvm_vcpu.viridian.vp_assist.vector; - v->arch.hvm_vcpu.viridian.vp_assist.vector = 0; + if ( v->arch.hvm_vcpu.viridian.vp_assist.pending && + !(*va & 1u) ) + { + /* An EOI has been avoided */ + v->arch.hvm_vcpu.viridian.vp_assist.pending = false; + return true; + } - return vector; + return false; } -void viridian_abort_apic_assist(struct vcpu *v) +void viridian_apic_assist_clear(struct vcpu *v) { uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va; @@ -480,7 +478,7 @@ void viridian_abort_apic_assist(struct vcpu *v) return; *va &= ~1u; - v->arch.hvm_vcpu.viridian.vp_assist.vector = 0; + v->arch.hvm_vcpu.viridian.vp_assist.pending = false; } static void update_reference_tsc(struct domain *d, bool_t initialize) @@ -1040,7 +1038,7 @@ static int viridian_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h) for_each_vcpu( d, v ) { struct hvm_viridian_vcpu_context ctxt = { .vp_assist_msr = v->arch.hvm_vcpu.viridian.vp_assist.msr.raw, - .vp_assist_vector = v->arch.hvm_vcpu.viridian.vp_assist.vector, + .vp_assist_pending = v->arch.hvm_vcpu.viridian.vp_assist.pending, }; if ( hvm_save_entry(VIRIDIAN_VCPU, v->vcpu_id, h, &ctxt) != 0 ) @@ -1075,7 +1073,7 @@ static int viridian_load_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h) !v->arch.hvm_vcpu.viridian.vp_assist.va ) initialize_vp_assist(v); - v->arch.hvm_vcpu.viridian.vp_assist.vector = ctxt.vp_assist_vector; + v->arch.hvm_vcpu.viridian.vp_assist.pending = !!ctxt.vp_assist_pending; return 0; } diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 50f53bdaef..7387f91fe0 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -416,18 +416,45 @@ struct vlapic *vlapic_lowest_prio( void vlapic_EOI_set(struct vlapic *vlapic) { - int vector = vlapic_find_highest_isr(vlapic); + struct vcpu *v = vlapic_vcpu(vlapic); + /* + * If APIC assist was set then an EOI may have been avoided and + * hence this EOI actually relates to a lower priority vector. + * Thus it is necessary to first emulate the EOI for the higher + * priority vector and then recurse to handle the lower priority + * vector. + */ + bool missed_eoi = viridian_apic_assist_completed(v); + int vector; + + again: + vector = vlapic_find_highest_isr(vlapic); /* Some EOI writes may not have a matching to an in-service interrupt. */ if ( vector == -1 ) return; + /* + * If APIC assist was set but the guest chose to EOI anyway then + * we need to clean up state. + * NOTE: It is harmless to call viridian_apic_assist_clear() on a + * recursion, even though it is not necessary. + */ + if ( !missed_eoi ) + viridian_apic_assist_clear(v); + vlapic_clear_vector(vector, &vlapic->regs->data[APIC_ISR]); if ( hvm_funcs.handle_eoi ) hvm_funcs.handle_eoi(vector); vlapic_handle_EOI(vlapic, vector); + + if ( missed_eoi ) + { + missed_eoi = false; + goto again; + } } void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) @@ -1239,7 +1266,7 @@ int vlapic_virtual_intr_delivery_enabled(void) int vlapic_has_pending_irq(struct vcpu *v) { struct vlapic *vlapic = vcpu_vlapic(v); - int irr, vector, isr; + int irr, isr; if ( !vlapic_enabled(vlapic) ) return -1; @@ -1252,27 +1279,32 @@ int vlapic_has_pending_irq(struct vcpu *v) !nestedhvm_vcpu_in_guestmode(v) ) return irr; + isr = vlapic_find_highest_isr(vlapic); + /* - * If APIC assist was used then there may have been no EOI so - * we need to clear the requisite bit from the ISR here, before - * comparing with the IRR. + * If APIC assist was set then an EOI may have been avoided. + * If so, we need to emulate the EOI here before comparing ISR + * with IRR. */ - vector = viridian_complete_apic_assist(v); - if ( vector ) - vlapic_clear_vector(vector, &vlapic->regs->data[APIC_ISR]); - - isr = vlapic_find_highest_isr(vlapic); - if ( isr == -1 ) - return irr; + if ( viridian_apic_assist_completed(v) ) + { + vlapic_EOI_set(vlapic); + isr = vlapic_find_highest_isr(vlapic); + } /* - * A vector is pending in the ISR so, regardless of whether the new - * vector in the IRR is lower or higher in priority, any pending - * APIC assist must be aborted to ensure an EOI. + * The specification says that if APIC assist is set and a + * subsequent interrupt of lower priority occurs then APIC assist + * needs to be cleared. */ - viridian_abort_apic_assist(v); + if ( isr >= 0 && + (irr & 0xf0) <= (isr & 0xf0) ) + { + viridian_apic_assist_clear(v); + return -1; + } - return ((isr & 0xf0) < (irr & 0xf0)) ? irr : -1; + return irr; } int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack) @@ -1290,13 +1322,14 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack) goto done; isr = vlapic_find_highest_isr(vlapic); - if ( isr == -1 ) + if ( isr == -1 && vector > 0x10 ) { /* - * This vector is edge triggered and no other vectors are pending - * in the ISR so we can use APIC assist to avoid exiting for EOI. + * This vector is edge triggered, not in the legacy range, and no + * lower priority vectors are pending in the ISR. Thus we can set + * APIC assist to avoid exiting for EOI. */ - viridian_start_apic_assist(v, vector); + viridian_apic_assist_set(v); } done: diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-x86/hvm/viridian.h index 30259e91b0..4cbd133720 100644 --- a/xen/include/asm-x86/hvm/viridian.h +++ b/xen/include/asm-x86/hvm/viridian.h @@ -24,7 +24,7 @@ struct viridian_vcpu struct { union viridian_vp_assist msr; void *va; - int vector; + bool pending; } vp_assist; uint64_t crash_param[5]; }; @@ -120,9 +120,9 @@ void viridian_time_ref_count_thaw(struct domain *d); void viridian_vcpu_deinit(struct vcpu *v); void viridian_domain_deinit(struct domain *d); -void viridian_start_apic_assist(struct vcpu *v, int vector); -int viridian_complete_apic_assist(struct vcpu *v); -void viridian_abort_apic_assist(struct vcpu *v); +void viridian_apic_assist_set(struct vcpu *v); +bool viridian_apic_assist_completed(struct vcpu *v); +void viridian_apic_assist_clear(struct vcpu *v); #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */ diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index fd7bf3fb38..4691d4d4aa 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -600,7 +600,7 @@ DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct hvm_viridian_domain_context); struct hvm_viridian_vcpu_context { uint64_t vp_assist_msr; - uint8_t vp_assist_vector; + uint8_t vp_assist_pending; uint8_t _pad[7]; }; -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.