Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 11/25] tools/libxc: support to resume uncooperative HVM guests

On 07/15/2015 08:26 PM, Ian Campbell wrote:
On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:
From: Wen Congyang <wency@xxxxxxxxxxxxxx>

1. suspend
a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
    request to the guest). If the guest doesn't support evtchn, the xenstore
    variant will be used, suspending the guest via XenBus control node.
b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
    the guest

2. Resume:
a. fast path
    In this case, we don't change the guest's state.
    PV: modify the return code to 1, and than call the domctl:
    PVHVM: same with PV
    HVM: do nothing in modify_returncode, and than call the domctl:
b. slow
    Used when the guest's state have been changed.
    PV: update start info, and reset all secondary CPU states. Than call the
    domctl: XEN_DOMCTL_resumedomain
    PVHVM and HVM can not be resumed.

For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain
can work. I am not sure if we should update start info and reset all
secondary CPU states.

For pure HVM guest, in my test, only call the domctl:
XEN_DOMCTL_resumedomain can work.

So we can call libxl__domain_resume(..., 1) if we don't change the guest
state, otherwise call libxl__domain_resume(..., 0).

Under COLO, we will update the guest's state(modify memory, cpu's registers,
device status...). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use a slow path to resume the guest. While
resuming HVM using slow path is not supported currently, this patch is to
make the resume call do not fail.

I'm afraid that the addition of this paragraph has not really addressed
my comment on v3:

         I'm afraid I think the commit message for this patch (and the 
         doc comments) need revisiting almost from scratch, to clearly explain
         what this patch is doing and why and what the constraints on the new
         functionality will be.

         At the moment it mostly talks in a confusing way about the old 
         and adds very specific assumptions to the new function which are not
         made clear.

It also appears that this has not been addressed:

         Hrm, so it sounds here like the correctness of this new functionality
         requires the caller to have not messed with the domain's state? What
         sort of changes are to the guest state are we talking about here?

This is used for secondary, at a checkpoint, we do:
1. suspend the guest
2. sync the guest state with primary  <== here the guest state has been changed
3. resume the guest
The guest state is changed by step 2, then we will resume the guest, since
the guest state has been changed, we cannot use the fast path to resume it.
For slow path, resume HVM is not supported currently, this patch is to add
the support.

While the XEN_DOMCTL_resumedomain hyper call for HVM is an NOP, it happens
to me that we could do this in a different way. We can modify
libxl__domain_resume, if the domain is HVM, we skip the xc_domain_resume
call, what do you think?

         Isn't that a new requirement for this call? If so then it should be
         documented somewhere, specifically what sorts of changes are and are 
         allowed and the types of guests which are affected.

The two usages of "in my test" in the commit message also do not inspire
confidence that this change is understood to be correct, vs. happening
to be something which works for you.


Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
  tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
  1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e67bebd..bd82334 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, 
uint32_t domid)
      return do_domctl(xch, &domctl);

+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+    /*
+     * If it is PVHVM, the hypercall return code is 0, because this
+     * is not a fast path resume, we do not modify_returncode as in
+     * xc_domain_resume_cooperative.
+     * (resuming it in a new domain context)
+     *
+     * If it is a HVM, the hypercall is a NOP.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
@@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, 
uint32_t domid)
  #if defined(__i386__) || defined(__x86_64__)
      if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);

      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )



