[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall



Hello Jan,

On 07.06.19 17:23, Jan Beulich wrote:
On 24.05.19 at 20:12, <andrii.anisov@xxxxxxxxx> wrote:
From: Andrii Anisov <andrii_anisov@xxxxxxxx>

Existing interface to register runstate are with its virtual address
is prone to issues which became more obvious with KPTI enablement in
guests. The nature of those issues is the fact that the guest could
be interrupted by the hypervisor at any time, and there is no guarantee
to have the registered virtual address translated with the currently
available guest's page tables. Before the KPTI such a situation was
possible in case the guest is caught in the middle of PT processing
(e.g. superpage shattering). With the KPTI this happens also when the
guest runs userspace, so has a pretty high probability.

Except when there's no need for KPTI in the guest in the first place,
as is the case for x86-64 PV guests. I think this is worthwhile clarifying.

So it was agreed to register runstate with the guest's physical address
so that its mapping is permanent from the hypervisor point of view. [1]

The hypercall employs the same vcpu_register_runstate_memory_area
structure for the interface, but requires a registered area to not
cross a page boundary.

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html

Signed-off-by: Andrii Anisov <andrii_anisov@xxxxxxxx>

I would have really hoped that you would outline the intended interaction
between both interfaces,

I'd suppose no specific interaction between interfaces. I hardly imagine 
realistic use-cases where this might be needed.

such that while reviewing one can judge whether
you're actually matching the goal. I think it is actually mandatory to make
explicit in the public header what level of mixing is permitted, what is not
permitted, and what mix of requests is simply undefined.> In particular, while we did work out during prior discussions that some
level of mixing has to be permitted, I'm unconvinced that arbitrary
mixing has to be allowed. For example, switching from one model to
another could be permitted only with just a single active vCPU. This
might allow to do away again with the runstate_in_use field you add.

Well, my point here is to left it as is, maybe add more documentation. If one 
likes shooting his leg, we should only care about avoiding ricochet harms 
hypervisor or other guests.
If you disagree, please suggest your interaction model, I'll be happy to 
implement it.


--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -274,17 +274,15 @@ static void ctxt_switch_to(struct vcpu *n)
      virt_timer_restore(n);
  }
-/* Update per-VCPU guest runstate shared memory area (if registered). */
-static void update_runstate_area(struct vcpu *v)
+static void update_runstate_by_gvaddr(struct vcpu *v)
  {
      void __user *guest_handle = NULL;
- if ( guest_handle_is_null(runstate_guest(v)) )
-        return;
+    ASSERT(!guest_handle_is_null(runstate_guest_virt(v)));
if ( VM_ASSIST(v->domain, runstate_update_flag) )
      {
-        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
+        guest_handle = &v->runstate_guest.virt.p->state_entry_time + 1;
          guest_handle--;
          v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
          __raw_copy_to_guest(guest_handle,
@@ -292,7 +290,7 @@ static void update_runstate_area(struct vcpu *v)
          smp_wmb();
      }
- __copy_to_guest(runstate_guest(v), &v->runstate, 1);
+    __copy_to_guest(runstate_guest_virt(v), &v->runstate, 1);

In a prior version you did the mechanical part of adjusting the VA-based
code in a prereq patch, aiding review. Is there a particular reason you
folded everything into one patch now?

I silently followed suggestion from George [1]. Any objections?


@@ -303,6 +301,53 @@ static void update_runstate_area(struct vcpu *v)
      }
  }
+static void update_runstate_by_gpaddr(struct vcpu *v)
+{
+    struct vcpu_runstate_info *runstate =
+            (struct vcpu_runstate_info *)v->runstate_guest.phys;

What's the cast for here?

@@ -1614,6 +1613,92 @@ bool update_runstate_area(struct vcpu *v)
      return rc;
  }
+static bool update_runstate_by_gpaddr_native(struct vcpu *v)
+{
+    struct vcpu_runstate_info *runstate =
+            (struct vcpu_runstate_info *)v->runstate_guest.phys;
+
+    ASSERT(runstate != NULL);
+
+    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    {
+        runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
+        smp_wmb();
+        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+    }
+
+    memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate));
+
+    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    {
+        runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+        smp_wmb();
+        v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+    }
+
+    return true;
+}

I can't help thinking that this matches the Arm code. Can common things
please be put in common code? I may be asking too much, but if the
pre-existing implementations are similar enough (I didn't check) perhaps
they could be folded in a first step, too?

The problem thought to be the interface: on x86 update_runstate_area() returns 
bool, but on ARM it is void.
But for the common function update_runstate_by_gpaddr_~native~() it would not 
be a problem, because we will return proper bool from the refactored 
update_runstate_area().
Thank you for the point.


+static bool update_runstate_by_gpaddr_compat(struct vcpu *v)
+{
+    struct compat_vcpu_runstate_info *runstate =
+            (struct compat_vcpu_runstate_info *)v->runstate_guest.phys;
+
+    ASSERT(runstate != NULL);
+
+    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    {
+        runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
+        smp_wmb();
+        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+    }
+
+    {
+        struct compat_vcpu_runstate_info info;
+        XLAT_vcpu_runstate_info(&info, &v->runstate);
+        memcpy(v->runstate_guest.phys, &info, sizeof(info));
+    }
+    else
+        memcpy(v->runstate_guest.phys, &v->runstate, sizeof(v->runstate));

This "else" does not seem to be paired with an if(). Does this build
at all?

This particular - not!
And it is really strange, I remember I checked patch compilation for x86. 
Looking through git reflog to realize at what amend it became broken.
But also I do not completely understand the meaning of "_compat" and if it 
should be supported here?


--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -12,6 +12,8 @@
  CHECK_vcpu_get_physid;
  #undef xen_vcpu_get_physid
+extern void discard_runstate_area(struct vcpu *v);

No, this is not allowed. The declaration must be visible to both consumer
and producer, so that when either side is changed things won't build until
the other side gets changed too.

@@ -35,8 +37,16 @@ arch_compat_vcpu_op(
               !compat_handle_okay(area.addr.h, 1) )
              break;
+ while( xchg(&v->runstate_in_use, 1) == 0);

At the very least such loops want a cpu_relax() in their bodies.
But this being on a hypercall path - are there theoretical guarantees
that a guest can't abuse this to lock up a CPU?

Furthermore I don't understand how this is supposed to work in
the first place. The first xchg() will store 1, no matter what. Thus
on the second iteration you'll find the wanted value unless the
other side stored 0. Yet the other side is a simple xchg() too.
Hence its first attempt would fail, but its second attempt (which
we didn't exit the critical region here yet) would succeed.

Also - style.

Will leave this part for locking issue discussion.


--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -698,6 +698,74 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, struct 
domain **d)
      return 0;
  }
+static void unmap_runstate_area(struct vcpu *v)
+{
+    mfn_t mfn;
+
+    if ( ! v->runstate_guest.phys )

Stray blank after ! .

OK.


+        return;
+
+    mfn = domain_page_map_to_mfn(v->runstate_guest.phys);
+
+    unmap_domain_page_global((void *)
+                             ((unsigned long)v->runstate_guest.phys &
+                              PAGE_MASK));
+
+    v->runstate_guest.phys = NULL;

I think you would better store NULL before unmapping.

Do you mean using local variable to pass address to unmap_domain_page_global(), 
and setting v->runstate_guest.phys to NULL prior to unmap?


@@ -734,7 +802,10 @@ int domain_kill(struct domain *d)
          if ( cpupool_move_domain(d, cpupool0) )
              return -ERESTART;
          for_each_vcpu ( d, v )
+        {
+            discard_runstate_area_locked(v);
              unmap_vcpu_info(v);
+        }

What is the "locked" aspect here about? The guest itself is dead (i.e.
fully paused) at this point, isn't it? And it won't ever be unpaused
again.

You are right. All vcpus here are stopped. We can discard runstate area without 
locking.


--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -163,17 +163,31 @@ struct vcpu
      void            *sched_priv;    /* scheduler-specific data */
struct vcpu_runstate_info runstate;
+
+    enum {
+        RUNSTATE_NONE = 0,
+        RUNSTATE_PADDR = 1,
+        RUNSTATE_VADDR = 2,
+    } runstate_guest_type;
+
+    unsigned long runstate_in_use;

Why "unsigned long"? Isn't a bool all you need?

Bool should be enough. But it seems we will have a lock here.

Also these would now all want to be grouped in a sub-structure named
"runstate", rather than having "runstate_" prefixes.

Member `runstate` has already a type of `struct vcpu_runstate_info` which is an 
interface type.
`runstate_guest` is a union. I'd not like moving `runstate_guest` union into another 
substructure. Because we would have long lines like 
`v->struct.runstate_guest.virt.p->state_entry_time`.


+        void*   phys;

Bad ordering between * and the blanks. There ought to be a blank
ahead of the *, and I don't see why you need any after it.

OK.


[1] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg00599.html

--
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.