[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v13] x86/emulate: Send vm_event from emulate



> -----Original Message-----
> From: Alexandru Stefan ISAILA <aisaila@xxxxxxxxxxxxxxx>
> Sent: 23 September 2019 13:06
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; jbeulich@xxxxxxxx; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; wl@xxxxxxx; Roger Pau Monne 
> <roger.pau@xxxxxxxxxx>; Razvan COJOCARU
> <rcojocaru@xxxxxxxxxxxxxxx>; tamas@xxxxxxxxxxxxx; Alexandru Stefan ISAILA 
> <aisaila@xxxxxxxxxxxxxxx>;
> Petre Ovidiu PIRCALABU <ppircalabu@xxxxxxxxxxxxxxx>; George Dunlap 
> <George.Dunlap@xxxxxxxxxx>
> Subject: [PATCH v13] x86/emulate: Send vm_event from emulate
> 
> A/D bit writes (on page walks) can be considered benign by an introspection
> agent, so receiving vm_events for them is a pessimization. We try here to
> optimize by filtering these events out.
> Currently, we are fully emulating the instruction at RIP when the hardware 
> sees
> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
> incorrect, because the instruction at RIP might legitimately cause an
> EPT fault of its own while accessing a _different_ page from the original one,
> where A/D were set.
> The solution is to perform the whole emulation, while ignoring EPT 
> restrictions
> for the walk part, and taking them into account for the "actual" emulating of
> the instruction at RIP. When we send out a vm_event, we don't want the 
> emulation
> to complete, since in that case we won't be able to veto whatever it is doing.
> That would mean that we can't actually prevent any malicious activity, instead
> we'd only be able to report on it.
> When we see a "send-vm_event" case while emulating, we need to first send the
> event out and then suspend the emulation (return X86EMUL_RETRY).
> After the emulation stops we'll call hvm_vm_event_do_resume() again after the
> introspection agent treats the event and resumes the guest. There, the
> instruction at RIP will be fully emulated (with the EPT ignored) if the
> introspection application allows it, and the guest will continue to run past
> the instruction.
> 
> A common example is if the hardware exits because of an EPT fault caused by a
> page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
> If the vm_event was sent and it would be treated so it runs the instruction
> at RIP, that instruction might also hit a protected page and provoke a 
> vm_event.
> 
> Now if npfec.kind == npfec_kind_in_gpt and 
> d->arch.monitor.inguest_pagefault_disabled
> is true then we are in the page walk case and we can do this emulation 
> optimization
> and emulate the page walk while ignoring the EPT, but don't ignore the EPT 
> for the
> emulation of the actual instruction.
> 
> In the first case we would have 2 EPT events, in the second case we would have
> 1 EPT event if the instruction at the RIP triggers an EPT event.
> 
> We use hvmemul_map_linear_addr() to intercept write access and
> __hvm_copy() to intercept exec, read and write access.
> 
> A new return type was added, HVMTRANS_need_retry, in order to have all
> the places that consume HVMTRANS* return X86EMUL_RETRY.
> 
> hvm_emulate_send_vm_event() can return false if there was no violation,
> if there was an error from monitor_traps() or p2m_get_mem_access().
> -ESRCH from p2m_get_mem_access() is treated as restricted access.
> 
> NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
> arch.vm_event->send_event
> 
> Signed-off-by: Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
> 

emulate parts...

Acked-by: Paul Durrant <paul@xxxxxxx>

> ---
> Changes since V12:
>       - Update commit message
>       - Rework switch() in map_linear_addr()/rep_movs()/rep_stos().
> ---
>  xen/arch/x86/hvm/emulate.c        | 57 +++++++++++++++++-----
>  xen/arch/x86/hvm/hvm.c            |  9 ++++
>  xen/arch/x86/hvm/intercept.c      |  2 +
>  xen/arch/x86/hvm/monitor.c        | 78 +++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/mem_access.c      |  9 +++-
>  xen/arch/x86/mm/shadow/hvm.c      |  1 +
>  xen/include/asm-x86/hvm/monitor.h |  3 ++
>  xen/include/asm-x86/hvm/support.h |  1 +
>  xen/include/asm-x86/vm_event.h    |  2 +
>  9 files changed, 149 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 36bcb526d3..85ab009717 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -548,6 +548,7 @@ static void *hvmemul_map_linear_addr(
>      unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
>          (linear >> PAGE_SHIFT) + 1;
>      unsigned int i;
> +    gfn_t gfn;
> 
>      /*
>       * mfn points to the next free slot.  All used slots have a page 
> reference
> @@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
>          ASSERT(mfn_x(*mfn) == 0);
> 
>          res = hvm_translate_get_page(curr, addr, true, pfec,
> -                                     &pfinfo, &page, NULL, &p2mt);
> +                                     &pfinfo, &page, &gfn, &p2mt);
> 
>          switch ( res )
>          {
> @@ -599,8 +600,15 @@ static void *hvmemul_map_linear_addr(
>              err = NULL;
>              goto out;
> 
> -        case HVMTRANS_gfn_paged_out:
> +        case HVMTRANS_need_retry:
> +            /*
> +             * hvm_translate_get_page() does not return HVMTRANS_need_retry.
> +             * It can dropped if future work requires this.
> +             */
> +            ASSERT_UNREACHABLE();
> +            /* fall through */
>          case HVMTRANS_gfn_shared:
> +        case HVMTRANS_gfn_paged_out:
>              err = ERR_PTR(~X86EMUL_RETRY);
>              goto out;
> 
> @@ -626,6 +634,14 @@ static void *hvmemul_map_linear_addr(
> 
>              ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
>          }
> +
> +        if ( unlikely(curr->arch.vm_event) &&
> +             curr->arch.vm_event->send_event &&
> +             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
> +        {
> +            err = ERR_PTR(~X86EMUL_RETRY);
> +            goto out;
> +        }
>      }
> 
>      /* Entire access within a single frame? */
> @@ -1141,6 +1157,7 @@ static int linear_read(unsigned long addr, unsigned int 
> bytes, void *p_data,
> 
>      case HVMTRANS_gfn_paged_out:
>      case HVMTRANS_gfn_shared:
> +    case HVMTRANS_need_retry:
>          return X86EMUL_RETRY;
>      }
> 
> @@ -1192,6 +1209,7 @@ static int linear_write(unsigned long addr, unsigned 
> int bytes, void *p_data,
> 
>      case HVMTRANS_gfn_paged_out:
>      case HVMTRANS_gfn_shared:
> +    case HVMTRANS_need_retry:
>          return X86EMUL_RETRY;
>      }
> 
> @@ -1852,19 +1870,27 @@ static int hvmemul_rep_movs(
> 
>      xfree(buf);
> 
> -    if ( rc == HVMTRANS_gfn_paged_out )
> -        return X86EMUL_RETRY;
> -    if ( rc == HVMTRANS_gfn_shared )
> -        return X86EMUL_RETRY;
> -    if ( rc != HVMTRANS_okay )
> +    switch ( rc )
>      {
> -        gdprintk(XENLOG_WARNING, "Failed memory-to-memory REP MOVS: sgpa=%"
> -                 PRIpaddr" dgpa=%"PRIpaddr" reps=%lu bytes_per_rep=%u\n",
> -                 sgpa, dgpa, *reps, bytes_per_rep);
> -        return X86EMUL_UNHANDLEABLE;
> +    case HVMTRANS_need_retry:
> +        /*
> +         * hvm_copy_to_guest_phys() does not return HVMTRANS_need_retry.
> +         * It can dropped if future work requires this.
> +         */
> +        ASSERT_UNREACHABLE();
> +        /* fall through */
> +    case HVMTRANS_gfn_paged_out:
> +    case HVMTRANS_gfn_shared:
> +        return X86EMUL_RETRY;
> +    case HVMTRANS_okay:
> +        return X86EMUL_OKAY;
>      }
> 
> -    return X86EMUL_OKAY;
> +    gdprintk(XENLOG_WARNING, "Failed memory-to-memory REP MOVS: sgpa=%"
> +             PRIpaddr" dgpa=%"PRIpaddr" reps=%lu bytes_per_rep=%u\n",
> +             sgpa, dgpa, *reps, bytes_per_rep);
> +
> +    return X86EMUL_UNHANDLEABLE;
>  }
> 
>  static int hvmemul_rep_stos(
> @@ -1966,6 +1992,13 @@ static int hvmemul_rep_stos(
> 
>          switch ( rc )
>          {
> +        case HVMTRANS_need_retry:
> +            /*
> +             * hvm_copy_to_guest_phys() does not return HVMTRANS_need_retry.
> +             * It can dropped if future work requires this.
> +             */
> +            ASSERT_UNREACHABLE();
> +            /* fall through */
>          case HVMTRANS_gfn_paged_out:
>          case HVMTRANS_gfn_shared:
>              return X86EMUL_RETRY;
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index fdb1e17f59..c82e7b2cd3 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3236,6 +3236,15 @@ static enum hvm_translation_result __hvm_copy(
>              return HVMTRANS_bad_gfn_to_mfn;
>          }
> 
> +        if ( unlikely(v->arch.vm_event) &&
> +             (flags & HVMCOPY_linear) &&
> +             v->arch.vm_event->send_event &&
> +             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
> +        {
> +            put_page(page);
> +            return HVMTRANS_need_retry;
> +        }
> +
>          p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
> 
>          if ( flags & HVMCOPY_to_guest )
> diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
> index aac22c595d..90202bdcec 100644
> --- a/xen/arch/x86/hvm/intercept.c
> +++ b/xen/arch/x86/hvm/intercept.c
> @@ -145,6 +145,7 @@ int hvm_process_io_intercept(const struct hvm_io_handler 
> *handler,
>                  case HVMTRANS_bad_linear_to_gfn:
>                  case HVMTRANS_gfn_paged_out:
>                  case HVMTRANS_gfn_shared:
> +                case HVMTRANS_need_retry:
>                      ASSERT_UNREACHABLE();
>                      /* fall through */
>                  default:
> @@ -174,6 +175,7 @@ int hvm_process_io_intercept(const struct hvm_io_handler 
> *handler,
>                  case HVMTRANS_bad_linear_to_gfn:
>                  case HVMTRANS_gfn_paged_out:
>                  case HVMTRANS_gfn_shared:
> +                case HVMTRANS_need_retry:
>                      ASSERT_UNREACHABLE();
>                      /* fall through */
>                  default:
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index 2a41ccc930..7fb1e2c04e 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -23,8 +23,10 @@
>   */
> 
>  #include <xen/vm_event.h>
> +#include <xen/mem_access.h>
>  #include <xen/monitor.h>
>  #include <asm/hvm/monitor.h>
> +#include <asm/altp2m.h>
>  #include <asm/monitor.h>
>  #include <asm/paging.h>
>  #include <asm/vm_event.h>
> @@ -215,6 +217,82 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned 
> int type,
>      monitor_traps(current, 1, &req);
>  }
> 
> +/*
> + * Send memory access vm_events based on pfec. Returns true if the event was
> + * sent and false for p2m_get_mem_access() error, no violation and event send
> + * error. Assumes the caller will enable/disable arch.vm_event->send_event.
> + */
> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
> +                           uint16_t kind)
> +{
> +    xenmem_access_t access;
> +    struct vcpu *curr = current;
> +    vm_event_request_t req = {};
> +    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
> +    int rc;
> +
> +    ASSERT(curr->arch.vm_event->send_event);
> +
> +    /*
> +     * p2m_get_mem_access() can fail from a invalid MFN and return -ESRCH
> +     * in which case access must be restricted.
> +     */
> +    rc = p2m_get_mem_access(curr->domain, gfn, &access, 
> altp2m_vcpu_idx(curr));
> +
> +    if ( rc == -ESRCH )
> +        access = XENMEM_access_n;
> +    else if ( rc )
> +        return false;
> +
> +    switch ( access )
> +    {
> +    case XENMEM_access_x:
> +    case XENMEM_access_rx:
> +        if ( pfec & PFEC_write_access )
> +            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
> +        break;
> +
> +    case XENMEM_access_w:
> +    case XENMEM_access_rw:
> +        if ( pfec & PFEC_insn_fetch )
> +            req.u.mem_access.flags = MEM_ACCESS_X;
> +        break;
> +
> +    case XENMEM_access_r:
> +    case XENMEM_access_n:
> +        if ( pfec & PFEC_write_access )
> +            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
> +        if ( pfec & PFEC_insn_fetch )
> +            req.u.mem_access.flags |= MEM_ACCESS_X;
> +        break;
> +
> +    case XENMEM_access_wx:
> +    case XENMEM_access_rwx:
> +    case XENMEM_access_rx2rw:
> +    case XENMEM_access_n2rwx:
> +    case XENMEM_access_default:
> +        break;
> +    }
> +
> +    if ( !req.u.mem_access.flags )
> +        return false; /* no violation */
> +
> +    if ( kind == npfec_kind_with_gla )
> +        req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA |
> +                                  MEM_ACCESS_GLA_VALID;
> +    else if ( kind == npfec_kind_in_gpt )
> +        req.u.mem_access.flags |= MEM_ACCESS_FAULT_IN_GPT |
> +                                  MEM_ACCESS_GLA_VALID;
> +
> +
> +    req.reason = VM_EVENT_REASON_MEM_ACCESS;
> +    req.u.mem_access.gfn = gfn_x(gfn);
> +    req.u.mem_access.gla = gla;
> +    req.u.mem_access.offset = gpa & ~PAGE_MASK;
> +
> +    return monitor_traps(curr, true, &req) >= 0;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
> index 0144f92b98..320b9fe621 100644
> --- a/xen/arch/x86/mm/mem_access.c
> +++ b/xen/arch/x86/mm/mem_access.c
> @@ -210,11 +210,18 @@ bool p2m_mem_access_check(paddr_t gpa, unsigned long 
> gla,
>              return true;
>          }
>      }
> +
> +    /*
> +     * Try to avoid sending a mem event. Suppress events caused by page-walks
> +     * by emulating but still checking mem_access violations.
> +     */
>      if ( vm_event_check_ring(d->vm_event_monitor) &&
>           d->arch.monitor.inguest_pagefault_disabled &&
> -         npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
> +         npfec.kind == npfec_kind_in_gpt )
>      {
> +        v->arch.vm_event->send_event = true;
>          hvm_emulate_one_vm_event(EMUL_KIND_NORMAL, TRAP_invalid_op, 
> X86_EVENT_NO_EC);
> +        v->arch.vm_event->send_event = false;
> 
>          return true;
>      }
> diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
> index 0aa560b7f5..48dfad4557 100644
> --- a/xen/arch/x86/mm/shadow/hvm.c
> +++ b/xen/arch/x86/mm/shadow/hvm.c
> @@ -139,6 +139,7 @@ hvm_read(enum x86_segment seg,
>          return X86EMUL_UNHANDLEABLE;
>      case HVMTRANS_gfn_paged_out:
>      case HVMTRANS_gfn_shared:
> +    case HVMTRANS_need_retry:
>          return X86EMUL_RETRY;
>      }
> 
> diff --git a/xen/include/asm-x86/hvm/monitor.h 
> b/xen/include/asm-x86/hvm/monitor.h
> index f1af4f812a..325b44674d 100644
> --- a/xen/include/asm-x86/hvm/monitor.h
> +++ b/xen/include/asm-x86/hvm/monitor.h
> @@ -49,6 +49,9 @@ void hvm_monitor_interrupt(unsigned int vector, unsigned 
> int type,
>                             unsigned int err, uint64_t cr2);
>  bool hvm_monitor_emul_unimplemented(void);
> 
> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
> +                           uint16_t kind);
> +
>  #endif /* __ASM_X86_HVM_MONITOR_H__ */
> 
>  /*
> diff --git a/xen/include/asm-x86/hvm/support.h 
> b/xen/include/asm-x86/hvm/support.h
> index e989aa7349..1500e6c94b 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -61,6 +61,7 @@ enum hvm_translation_result {
>      HVMTRANS_unhandleable,
>      HVMTRANS_gfn_paged_out,
>      HVMTRANS_gfn_shared,
> +    HVMTRANS_need_retry,
>  };
> 
>  /*
> diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
> index 23e655710b..66db9e1e25 100644
> --- a/xen/include/asm-x86/vm_event.h
> +++ b/xen/include/asm-x86/vm_event.h
> @@ -36,6 +36,8 @@ struct arch_vm_event {
>      bool set_gprs;
>      /* A sync vm_event has been sent and we're not done handling it. */
>      bool sync_event;
> +    /* Send mem access events from emulator */
> +    bool send_event;
>  };
> 
>  int vm_event_init_domain(struct domain *d);
> --
> 2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.