[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v3] x86/hvm: re-work viridian APIC assist code



It appears there is a case where Windows enables the APIC assist
enlightenment[1] but does not use it. This scenario is perfectly valid
according to the documentation, but causes the state machine in Xen to
become confused leading to a domain_crash() such as the following:

(XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0
      build: 1db0
(XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
(XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe
(XEN) domain_crash called from viridian.c:452
(XEN) Domain 4 (vcpu#0) crashed on cpu#1:

The following sequence of events is an example of how this can happen:

 - On return to guest vlapic_has_pending_irq() finds a bit set in the IRR.
 - vlapic_ack_pending_irq() calls viridian_start_apic_assist() which latches
   the vector, sets the bit in the ISR and clears it from the IRR.
 - The guest then processes the interrupt but EOIs it normally, therefore
   clearing the bit in the ISR.
 - On next return to guest vlapic_has_pending_irq() calls
   viridian_complete_apic_assist(), which discovers the assist bit still set
   in the shared page and therefore leaves the latched vector in place, but
   also finds another bit set in the IRR.
 - vlapic_ack_pending_irq() is then called but, because the ISR is was
   cleared by the EOI, another call is made to viridian_start_apic_assist()
   and this then calls domain_crash() because it finds the latched vector
   has not been cleared.

Having re-visited the code I also conclude that Xen's implementation of the
enlightenment is currently wrong and we are not properly following the
specification.

The specification says:

"The hypervisor sets the “No EOI required” bit when it injects a virtual
 interrupt if the following conditions are satisfied:

 - The virtual interrupt is edge-triggered, and
 - There are no lower priority interrupts pending.

 If, at a later time, a lower priority interrupt is requested, the
 hypervisor clears the “No EOI required” such that a subsequent EOI causes
 an intercept.
 In case of nested interrupts, the EOI intercept is avoided only for the
 highest priority interrupt. This is necessary since no count is maintained
 for the number of EOIs performed by the OS. Therefore only the first EOI
 can be avoided and since the first EOI clears the “No EOI Required” bit,
 the next EOI generates an intercept."

Thus it is quite legitimate to set the "No EOI required" bit and then
subsequently take a higher priority interrupt without clearing the bit.
Thus the avoided EOI will then relate to that subsequent interrupt rather
than the highest priority interrupt when the bit was set. Hence latching
the vector when setting the bit is not entirely useful and somewhat
misleading.

This patch re-works the APIC assist code to simply track when the "No EOI
required" bit is set and test if it has been cleared by the guest (i.e.
'completing' the APIC assist), thus indicating a 'missed EOI'. Missed EOIs
need to be dealt with in two places:

 - In vlapic_has_pending_irq(), to avoid comparing the IRR against a stale
   ISR, and
 - In vlapic_EOI_set() because a missed EOI for a higher priority vector
   should be dealt with before the actual EOI for the lower priority
   vector.

Furthermore, because the guest is at liberty to ignore the "No EOI required"
bit (which lead the crash detailed above) vlapic_EOI_set() must also make
sure the bit is cleared to avoid confusing the state machine.

Lastly the previous code did not properly emulate an EOI if a missed EOI
was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
the ISR. The new code instead calls vlapic_EOI_set().

[1] See section 10.3.5 of Microsoft's "Hypervisor Top Level Functional
    Specification v5.0b".

NOTE: The changes to the save/restore code are safe because the layout
      of struct hvm_viridian_vcpu_context is unchanged and the new
      interpretation of the (previously so named) vp_assist_vector field
      as the boolean pending flag maintains the correct semantics.

Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
---
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

v3:
 - Cosmetic fixes requested by Jan

v2:
 - Extend patch to completely re-work APIC assist code
---
 xen/arch/x86/hvm/viridian.c            | 36 ++++++++--------
 xen/arch/x86/hvm/vlapic.c              | 75 ++++++++++++++++++++++++----------
 xen/include/asm-x86/hvm/viridian.h     |  8 ++--
 xen/include/public/arch-x86/hvm/save.h |  2 +-
 4 files changed, 76 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
index f0fa59d7d5..70aab520bc 100644
--- a/xen/arch/x86/hvm/viridian.c
+++ b/xen/arch/x86/hvm/viridian.c
@@ -433,46 +433,44 @@ static void teardown_vp_assist(struct vcpu *v)
     put_page_and_type(page);
 }
 
-void viridian_start_apic_assist(struct vcpu *v, int vector)
+void viridian_apic_assist_set(struct vcpu *v)
 {
     uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va;
 
     if ( !va )
         return;
 
-    if ( vector < 0x10 )
-        return;
-
     /*
      * If there is already an assist pending then something has gone
      * wrong and the VM will most likely hang so force a crash now
      * to make the problem clear.
      */
-    if ( v->arch.hvm_vcpu.viridian.vp_assist.vector )
+    if ( v->arch.hvm_vcpu.viridian.vp_assist.pending )
         domain_crash(v->domain);
 
-    v->arch.hvm_vcpu.viridian.vp_assist.vector = vector;
+    v->arch.hvm_vcpu.viridian.vp_assist.pending = true;
     *va |= 1u;
 }
 
-int viridian_complete_apic_assist(struct vcpu *v)
+bool viridian_apic_assist_completed(struct vcpu *v)
 {
     uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va;
-    int vector;
 
     if ( !va )
-        return 0;
+        return false;
 
-    if ( *va & 1u )
-        return 0; /* Interrupt not yet processed by the guest. */
-
-    vector = v->arch.hvm_vcpu.viridian.vp_assist.vector;
-    v->arch.hvm_vcpu.viridian.vp_assist.vector = 0;
+    if ( v->arch.hvm_vcpu.viridian.vp_assist.pending &&
+         !(*va & 1u) )
+    {
+        /* An EOI has been avoided */
+        v->arch.hvm_vcpu.viridian.vp_assist.pending = false;
+        return true;
+    }
 
-    return vector;
+    return false;
 }
 
-void viridian_abort_apic_assist(struct vcpu *v)
+void viridian_apic_assist_clear(struct vcpu *v)
 {
     uint32_t *va = v->arch.hvm_vcpu.viridian.vp_assist.va;
 
@@ -480,7 +478,7 @@ void viridian_abort_apic_assist(struct vcpu *v)
         return;
 
     *va &= ~1u;
-    v->arch.hvm_vcpu.viridian.vp_assist.vector = 0;
+    v->arch.hvm_vcpu.viridian.vp_assist.pending = false;
 }
 
 static void update_reference_tsc(struct domain *d, bool_t initialize)
@@ -1040,7 +1038,7 @@ static int viridian_save_vcpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
     for_each_vcpu( d, v ) {
         struct hvm_viridian_vcpu_context ctxt = {
             .vp_assist_msr = v->arch.hvm_vcpu.viridian.vp_assist.msr.raw,
-            .vp_assist_vector = v->arch.hvm_vcpu.viridian.vp_assist.vector,
+            .vp_assist_pending = v->arch.hvm_vcpu.viridian.vp_assist.pending,
         };
 
         if ( hvm_save_entry(VIRIDIAN_VCPU, v->vcpu_id, h, &ctxt) != 0 )
@@ -1075,7 +1073,7 @@ static int viridian_load_vcpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
          !v->arch.hvm_vcpu.viridian.vp_assist.va )
         initialize_vp_assist(v);
 
-    v->arch.hvm_vcpu.viridian.vp_assist.vector = ctxt.vp_assist_vector;
+    v->arch.hvm_vcpu.viridian.vp_assist.pending = !!ctxt.vp_assist_pending;
 
     return 0;
 }
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 50f53bdaef..7387f91fe0 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -416,18 +416,45 @@ struct vlapic *vlapic_lowest_prio(
 
 void vlapic_EOI_set(struct vlapic *vlapic)
 {
-    int vector = vlapic_find_highest_isr(vlapic);
+    struct vcpu *v = vlapic_vcpu(vlapic);
+    /*
+     * If APIC assist was set then an EOI may have been avoided and
+     * hence this EOI actually relates to a lower priority vector.
+     * Thus it is necessary to first emulate the EOI for the higher
+     * priority vector and then recurse to handle the lower priority
+     * vector.
+     */
+    bool missed_eoi = viridian_apic_assist_completed(v);
+    int vector;
+
+ again:
+    vector = vlapic_find_highest_isr(vlapic);
 
     /* Some EOI writes may not have a matching to an in-service interrupt. */
     if ( vector == -1 )
         return;
 
+    /*
+     * If APIC assist was set but the guest chose to EOI anyway then
+     * we need to clean up state.
+     * NOTE: It is harmless to call viridian_apic_assist_clear() on a
+     *       recursion, even though it is not necessary.
+     */
+    if ( !missed_eoi )
+        viridian_apic_assist_clear(v);
+
     vlapic_clear_vector(vector, &vlapic->regs->data[APIC_ISR]);
 
     if ( hvm_funcs.handle_eoi )
         hvm_funcs.handle_eoi(vector);
 
     vlapic_handle_EOI(vlapic, vector);
+
+    if ( missed_eoi )
+    {
+        missed_eoi = false;
+        goto again;
+    }
 }
 
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
@@ -1239,7 +1266,7 @@ int vlapic_virtual_intr_delivery_enabled(void)
 int vlapic_has_pending_irq(struct vcpu *v)
 {
     struct vlapic *vlapic = vcpu_vlapic(v);
-    int irr, vector, isr;
+    int irr, isr;
 
     if ( !vlapic_enabled(vlapic) )
         return -1;
@@ -1252,27 +1279,32 @@ int vlapic_has_pending_irq(struct vcpu *v)
          !nestedhvm_vcpu_in_guestmode(v) )
         return irr;
 
+    isr = vlapic_find_highest_isr(vlapic);
+
     /*
-     * If APIC assist was used then there may have been no EOI so
-     * we need to clear the requisite bit from the ISR here, before
-     * comparing with the IRR.
+     * If APIC assist was set then an EOI may have been avoided.
+     * If so, we need to emulate the EOI here before comparing ISR
+     * with IRR.
      */
-    vector = viridian_complete_apic_assist(v);
-    if ( vector )
-        vlapic_clear_vector(vector, &vlapic->regs->data[APIC_ISR]);
-
-    isr = vlapic_find_highest_isr(vlapic);
-    if ( isr == -1 )
-        return irr;
+    if ( viridian_apic_assist_completed(v) )
+    {
+        vlapic_EOI_set(vlapic);
+        isr = vlapic_find_highest_isr(vlapic);
+    }
 
     /*
-     * A vector is pending in the ISR so, regardless of whether the new
-     * vector in the IRR is lower or higher in priority, any pending
-     * APIC assist must be aborted to ensure an EOI.
+     * The specification says that if APIC assist is set and a
+     * subsequent interrupt of lower priority occurs then APIC assist
+     * needs to be cleared.
      */
-    viridian_abort_apic_assist(v);
+    if ( isr >= 0 &&
+         (irr & 0xf0) <= (isr & 0xf0) )
+    {
+        viridian_apic_assist_clear(v);
+        return -1;
+    }
 
-    return ((isr & 0xf0) < (irr & 0xf0)) ? irr : -1;
+    return irr;
 }
 
 int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack)
@@ -1290,13 +1322,14 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, 
bool_t force_ack)
         goto done;
 
     isr = vlapic_find_highest_isr(vlapic);
-    if ( isr == -1 )
+    if ( isr == -1 && vector > 0x10 )
     {
         /*
-         * This vector is edge triggered and no other vectors are pending
-         * in the ISR so we can use APIC assist to avoid exiting for EOI.
+         * This vector is edge triggered, not in the legacy range, and no
+         * lower priority vectors are pending in the ISR. Thus we can set
+         * APIC assist to avoid exiting for EOI.
          */
-        viridian_start_apic_assist(v, vector);
+        viridian_apic_assist_set(v);
     }
 
  done:
diff --git a/xen/include/asm-x86/hvm/viridian.h 
b/xen/include/asm-x86/hvm/viridian.h
index 30259e91b0..4cbd133720 100644
--- a/xen/include/asm-x86/hvm/viridian.h
+++ b/xen/include/asm-x86/hvm/viridian.h
@@ -24,7 +24,7 @@ struct viridian_vcpu
     struct {
         union viridian_vp_assist msr;
         void *va;
-        int vector;
+        bool pending;
     } vp_assist;
     uint64_t crash_param[5];
 };
@@ -120,9 +120,9 @@ void viridian_time_ref_count_thaw(struct domain *d);
 void viridian_vcpu_deinit(struct vcpu *v);
 void viridian_domain_deinit(struct domain *d);
 
-void viridian_start_apic_assist(struct vcpu *v, int vector);
-int viridian_complete_apic_assist(struct vcpu *v);
-void viridian_abort_apic_assist(struct vcpu *v);
+void viridian_apic_assist_set(struct vcpu *v);
+bool viridian_apic_assist_completed(struct vcpu *v);
+void viridian_apic_assist_clear(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
 
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index fd7bf3fb38..4691d4d4aa 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -600,7 +600,7 @@ DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct 
hvm_viridian_domain_context);
 
 struct hvm_viridian_vcpu_context {
     uint64_t vp_assist_msr;
-    uint8_t  vp_assist_vector;
+    uint8_t  vp_assist_pending;
     uint8_t  _pad[7];
 };
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.