[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] x86/emulate: send vm_event from emulate



commit b8443cb39b04a6449e0d093b62f10c5a89775a78
Author:     Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
AuthorDate: Tue Sep 24 10:49:36 2019 +0200
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Tue Sep 24 10:49:36 2019 +0200

    x86/emulate: send vm_event from emulate
    
    A/D bit writes (on page walks) can be considered benign by an introspection
    agent, so receiving vm_events for them is a pessimization. We try here to
    optimize by filtering these events out.
    Currently, we are fully emulating the instruction at RIP when the hardware 
sees
    an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
    incorrect, because the instruction at RIP might legitimately cause an
    EPT fault of its own while accessing a _different_ page from the original 
one,
    where A/D were set.
    The solution is to perform the whole emulation, while ignoring EPT 
restrictions
    for the walk part, and taking them into account for the "actual" emulating 
of
    the instruction at RIP. When we send out a vm_event, we don't want the 
emulation
    to complete, since in that case we won't be able to veto whatever it is 
doing.
    That would mean that we can't actually prevent any malicious activity, 
instead
    we'd only be able to report on it.
    When we see a "send-vm_event" case while emulating, we need to first send 
the
    event out and then suspend the emulation (return X86EMUL_RETRY).
    After the emulation stops we'll call hvm_vm_event_do_resume() again after 
the
    introspection agent treats the event and resumes the guest. There, the
    instruction at RIP will be fully emulated (with the EPT ignored) if the
    introspection application allows it, and the guest will continue to run past
    the instruction.
    
    A common example is if the hardware exits because of an EPT fault caused by 
a
    page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
    If the vm_event was sent and it would be treated so it runs the instruction
    at RIP, that instruction might also hit a protected page and provoke a 
vm_event.
    
    Now if npfec.kind == npfec_kind_in_gpt and 
d->arch.monitor.inguest_pagefault_disabled
    is true then we are in the page walk case and we can do this emulation 
optimization
    and emulate the page walk while ignoring the EPT, but don't ignore the EPT 
for the
    emulation of the actual instruction.
    
    In the first case we would have 2 EPT events, in the second case we would 
have
    1 EPT event if the instruction at the RIP triggers an EPT event.
    
    We use hvmemul_map_linear_addr() to intercept write access and
    __hvm_copy() to intercept exec, read and write access.
    
    A new return type was added, HVMTRANS_need_retry, in order to have all
    the places that consume HVMTRANS* return X86EMUL_RETRY.
    
    hvm_emulate_send_vm_event() can return false if there was no violation,
    if there was an error from monitor_traps() or p2m_get_mem_access().
    -ESRCH from p2m_get_mem_access() is treated as restricted access.
    
    NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
    arch.vm_event->send_event
    
    Signed-off-by: Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
    Acked-by: Paul Durrant <paul@xxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
    Acked-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
    Reviewed-by: Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>
---
 xen/arch/x86/hvm/emulate.c        | 55 +++++++++++++++++++++------
 xen/arch/x86/hvm/hvm.c            |  9 +++++
 xen/arch/x86/hvm/intercept.c      |  2 +
 xen/arch/x86/hvm/monitor.c        | 78 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/mem_access.c      |  9 ++++-
 xen/arch/x86/mm/shadow/hvm.c      |  1 +
 xen/include/asm-x86/hvm/monitor.h |  3 ++
 xen/include/asm-x86/hvm/support.h |  1 +
 xen/include/asm-x86/vm_event.h    |  2 +
 9 files changed, 148 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 36bcb526d3..637034b6a1 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -548,6 +548,7 @@ static void *hvmemul_map_linear_addr(
     unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
         (linear >> PAGE_SHIFT) + 1;
     unsigned int i;
+    gfn_t gfn;
 
     /*
      * mfn points to the next free slot.  All used slots have a page reference
@@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
         ASSERT(mfn_x(*mfn) == 0);
 
         res = hvm_translate_get_page(curr, addr, true, pfec,
-                                     &pfinfo, &page, NULL, &p2mt);
+                                     &pfinfo, &page, &gfn, &p2mt);
 
         switch ( res )
         {
@@ -599,6 +600,13 @@ static void *hvmemul_map_linear_addr(
             err = NULL;
             goto out;
 
+        case HVMTRANS_need_retry:
+            /*
+             * hvm_translate_get_page() does not currently return
+             * HVMTRANS_need_retry.
+             */
+            ASSERT_UNREACHABLE();
+            /* fall through */
         case HVMTRANS_gfn_paged_out:
         case HVMTRANS_gfn_shared:
             err = ERR_PTR(~X86EMUL_RETRY);
@@ -626,6 +634,14 @@ static void *hvmemul_map_linear_addr(
 
             ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
         }
+
+        if ( unlikely(curr->arch.vm_event) &&
+             curr->arch.vm_event->send_event &&
+             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
+        {
+            err = ERR_PTR(~X86EMUL_RETRY);
+            goto out;
+        }
     }
 
     /* Entire access within a single frame? */
@@ -1141,6 +1157,7 @@ static int linear_read(unsigned long addr, unsigned int 
bytes, void *p_data,
 
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
@@ -1192,6 +1209,7 @@ static int linear_write(unsigned long addr, unsigned int 
bytes, void *p_data,
 
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
@@ -1852,19 +1870,27 @@ static int hvmemul_rep_movs(
 
     xfree(buf);
 
-    if ( rc == HVMTRANS_gfn_paged_out )
-        return X86EMUL_RETRY;
-    if ( rc == HVMTRANS_gfn_shared )
-        return X86EMUL_RETRY;
-    if ( rc != HVMTRANS_okay )
+    switch ( rc )
     {
-        gdprintk(XENLOG_WARNING, "Failed memory-to-memory REP MOVS: sgpa=%"
-                 PRIpaddr" dgpa=%"PRIpaddr" reps=%lu bytes_per_rep=%u\n",
-                 sgpa, dgpa, *reps, bytes_per_rep);
-        return X86EMUL_UNHANDLEABLE;
+    case HVMTRANS_need_retry:
+        /*
+         * hvm_copy_{from,to}_guest_phys() do not currently return
+         * HVMTRANS_need_retry.
+         */
+        ASSERT_UNREACHABLE();
+        /* fall through */
+    case HVMTRANS_gfn_paged_out:
+    case HVMTRANS_gfn_shared:
+        return X86EMUL_RETRY;
+    case HVMTRANS_okay:
+        return X86EMUL_OKAY;
     }
 
-    return X86EMUL_OKAY;
+    gdprintk(XENLOG_WARNING, "Failed memory-to-memory REP MOVS: sgpa=%"
+             PRIpaddr" dgpa=%"PRIpaddr" reps=%lu bytes_per_rep=%u\n",
+             sgpa, dgpa, *reps, bytes_per_rep);
+
+    return X86EMUL_UNHANDLEABLE;
 }
 
 static int hvmemul_rep_stos(
@@ -1966,6 +1992,13 @@ static int hvmemul_rep_stos(
 
         switch ( rc )
         {
+        case HVMTRANS_need_retry:
+            /*
+             * hvm_copy_to_guest_phys() does not currently return
+             * HVMTRANS_need_retry.
+             */
+            ASSERT_UNREACHABLE();
+            /* fall through */
         case HVMTRANS_gfn_paged_out:
         case HVMTRANS_gfn_shared:
             return X86EMUL_RETRY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index fdb1e17f59..c82e7b2cd3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3236,6 +3236,15 @@ static enum hvm_translation_result __hvm_copy(
             return HVMTRANS_bad_gfn_to_mfn;
         }
 
+        if ( unlikely(v->arch.vm_event) &&
+             (flags & HVMCOPY_linear) &&
+             v->arch.vm_event->send_event &&
+             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
+        {
+            put_page(page);
+            return HVMTRANS_need_retry;
+        }
+
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
         if ( flags & HVMCOPY_to_guest )
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index aac22c595d..90202bdcec 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -145,6 +145,7 @@ int hvm_process_io_intercept(const struct hvm_io_handler 
*handler,
                 case HVMTRANS_bad_linear_to_gfn:
                 case HVMTRANS_gfn_paged_out:
                 case HVMTRANS_gfn_shared:
+                case HVMTRANS_need_retry:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
@@ -174,6 +175,7 @@ int hvm_process_io_intercept(const struct hvm_io_handler 
*handler,
                 case HVMTRANS_bad_linear_to_gfn:
                 case HVMTRANS_gfn_paged_out:
                 case HVMTRANS_gfn_shared:
+                case HVMTRANS_need_retry:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 2a41ccc930..7fb1e2c04e 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -23,8 +23,10 @@
  */
 
 #include <xen/vm_event.h>
+#include <xen/mem_access.h>
 #include <xen/monitor.h>
 #include <asm/hvm/monitor.h>
+#include <asm/altp2m.h>
 #include <asm/monitor.h>
 #include <asm/paging.h>
 #include <asm/vm_event.h>
@@ -216,6 +218,82 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned 
int type,
 }
 
 /*
+ * Send memory access vm_events based on pfec. Returns true if the event was
+ * sent and false for p2m_get_mem_access() error, no violation and event send
+ * error. Assumes the caller will enable/disable arch.vm_event->send_event.
+ */
+bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
+                           uint16_t kind)
+{
+    xenmem_access_t access;
+    struct vcpu *curr = current;
+    vm_event_request_t req = {};
+    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
+    int rc;
+
+    ASSERT(curr->arch.vm_event->send_event);
+
+    /*
+     * p2m_get_mem_access() can fail from a invalid MFN and return -ESRCH
+     * in which case access must be restricted.
+     */
+    rc = p2m_get_mem_access(curr->domain, gfn, &access, altp2m_vcpu_idx(curr));
+
+    if ( rc == -ESRCH )
+        access = XENMEM_access_n;
+    else if ( rc )
+        return false;
+
+    switch ( access )
+    {
+    case XENMEM_access_x:
+    case XENMEM_access_rx:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
+        break;
+
+    case XENMEM_access_w:
+    case XENMEM_access_rw:
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags = MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_r:
+    case XENMEM_access_n:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags |= MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_wx:
+    case XENMEM_access_rwx:
+    case XENMEM_access_rx2rw:
+    case XENMEM_access_n2rwx:
+    case XENMEM_access_default:
+        break;
+    }
+
+    if ( !req.u.mem_access.flags )
+        return false; /* no violation */
+
+    if ( kind == npfec_kind_with_gla )
+        req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA |
+                                  MEM_ACCESS_GLA_VALID;
+    else if ( kind == npfec_kind_in_gpt )
+        req.u.mem_access.flags |= MEM_ACCESS_FAULT_IN_GPT |
+                                  MEM_ACCESS_GLA_VALID;
+
+
+    req.reason = VM_EVENT_REASON_MEM_ACCESS;
+    req.u.mem_access.gfn = gfn_x(gfn);
+    req.u.mem_access.gla = gla;
+    req.u.mem_access.offset = gpa & ~PAGE_MASK;
+
+    return monitor_traps(curr, true, &req) >= 0;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 0144f92b98..320b9fe621 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -210,11 +210,18 @@ bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
             return true;
         }
     }
+
+    /*
+     * Try to avoid sending a mem event. Suppress events caused by page-walks
+     * by emulating but still checking mem_access violations.
+     */
     if ( vm_event_check_ring(d->vm_event_monitor) &&
          d->arch.monitor.inguest_pagefault_disabled &&
-         npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
+         npfec.kind == npfec_kind_in_gpt )
     {
+        v->arch.vm_event->send_event = true;
         hvm_emulate_one_vm_event(EMUL_KIND_NORMAL, TRAP_invalid_op, 
X86_EVENT_NO_EC);
+        v->arch.vm_event->send_event = false;
 
         return true;
     }
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index 0aa560b7f5..48dfad4557 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -139,6 +139,7 @@ hvm_read(enum x86_segment seg,
         return X86EMUL_UNHANDLEABLE;
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
diff --git a/xen/include/asm-x86/hvm/monitor.h 
b/xen/include/asm-x86/hvm/monitor.h
index f1af4f812a..325b44674d 100644
--- a/xen/include/asm-x86/hvm/monitor.h
+++ b/xen/include/asm-x86/hvm/monitor.h
@@ -49,6 +49,9 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned int 
type,
                            unsigned int err, uint64_t cr2);
 bool hvm_monitor_emul_unimplemented(void);
 
+bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
+                           uint16_t kind);
+
 #endif /* __ASM_X86_HVM_MONITOR_H__ */
 
 /*
diff --git a/xen/include/asm-x86/hvm/support.h 
b/xen/include/asm-x86/hvm/support.h
index e989aa7349..1500e6c94b 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -61,6 +61,7 @@ enum hvm_translation_result {
     HVMTRANS_unhandleable,
     HVMTRANS_gfn_paged_out,
     HVMTRANS_gfn_shared,
+    HVMTRANS_need_retry,
 };
 
 /*
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index 23e655710b..66db9e1e25 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -36,6 +36,8 @@ struct arch_vm_event {
     bool set_gprs;
     /* A sync vm_event has been sent and we're not done handling it. */
     bool sync_event;
+    /* Send mem access events from emulator */
+    bool send_event;
 };
 
 int vm_event_init_domain(struct domain *d);
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.