[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Fri, 19 Feb 2016 14:43:50 +0000
Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>, Changlong Xie <xiecl.fnst@xxxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Wen Congyang <wency@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jiang Yunhong <yunhong.jiang@xxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxx>, Dong Eddie <eddie.dong@xxxxxxxxx>, Gui Jianfeng <guijianfeng@xxxxxxxxxxxxxx>, Shriram Rajagopalan <rshriram@xxxxxxxxx>, Yang Hongyang <hongyang.yang@xxxxxxxxxxxx>
Delivery-date: Fri, 19 Feb 2016 14:44:01 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > Before this patch:
> > > 1. suspend
> > > a. PVHVM and PV: we use the same way to suspend the guest (send the 
> > > suspend
> > >    request to the guest). If the guest doesn't support evtchn, the 
> > > xenstore
> > >    variant will be used, suspending the guest via XenBus control node.
> > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
> > >    the guest
> > > 
> > > 2. Resume:
> > > a. fast path(fast=1)
> > >    Do not change the guest state. We call libxl__domain_resume(.., 1) 
> > > which
> > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > >    PV:       modify the return code to 1, and than call the domctl:
> > >              XEN_DOMCTL_resumedomain
> > >    PVHVM:    same with PV
> > >    pure HVM: do nothing in modify_returncode, and than call the domctl:
> > >              XEN_DOMCTL_resumedomain
> > > b. slow
> > >    Used when the guest's state have been changed. Will call
> > >    libxl__domain_resume(..., 0) to resume the guest.
> > >    PV:       update start info, and reset all secondary CPU states. Than 
> > > call
> > >              the domctl: XEN_DOMCTL_resumedomain
> > >    PVHVM:    can not be resumed. You will get the following error message:
> > >                  "Cannot resume uncooperative HVM guests"
> > >    pure HVM: same with PVHVM
> > > 
> > > After this patch:
> > > 1. suspend
> > >    unchanged
> > > 
> > > 2. Resume
> > > a. fast path:
> > >    unchanged
> > > b. slow
> > >    PV:       unchanged
> > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
> > >              don't modify the return code, the PV driver will disconnect
> > >              and reconnect.
> > >              The guest ends up doing the XENMAPSPACE_shared_info
> > >              XENMEM_add_to_physmap hypercall and resetting all of its CPU
> > >              states to point to the shared_info(well except the ones past 
> > > 32).
> > >              That is the Linux kernel does that - regardless whether the
> > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > 
> > > Under COLO, we will update the guest's state(modify memory, cpu's 
> > > registers,
> > > device status...). In this case, we cannot use the fast path to resume it.
> > > Keep the return code 0, and use a slow path to resume the guest. While
> > > resuming HVM using slow path is not supported currently, this patch is to
> > > make the resume call to not fail.
> > > 
> > > Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
> > > Signed-off-by: Yang Hongyang <hongyang.yang@xxxxxxxxxxxx>
> > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > 
> > I proposed an alternative commit log in a previous reply:
> > 
> > ===
> > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > 
> > Previously it was not possible to resume PVHVM or pure HVM guest in slow
> > path because libxc didn't support that.
> > 
> > Using XEN_DOMCTL_resumedomain without modifying guest return code  to 
> > resume a
> > guest is considered to be always safe.  Introduce a function to do that for
> > (PV)HVM guests in slow path resume.
> > 
> > This patch fixes a bug that denies (PV)HVM slow path resume.  This will
> > enable COLO to work properly:  COLO requires HVM guest to start in the
> > new context that has been set up by COLO, hence slow path resume is
> > required.
> > ===
> > 
> > Note that I fix one place in this version from "guest state" to "guest
> > return code" in the second paragraph. And that sentence is a big big
> > assumption that I don't know whether it is true or not --
> > reverse-engineer from comment before xc_domain_resume and what Linux
> > does.
> > 
> > But the more I think the more I'm not sure if I'm writing the right
> > thing. I also can't judge what is the right behaviour on the Linux side.
> > 
> > Konrad, can you fact-check the commit message a bit? And maybe you can
> > help answer the following questions?
> > 
> > 1. If we use fast=0 on PVHVM guest, will it work?
> 
> Yes.
> > 2. If we use fast=0 on HVM guest, will it work?
> 
> Yes.
> 
> > 
> > What is worse, when I say "work" I actually have no clear definition of
> > it. There doesn't seem to be a defined state that the guest needs to be.
> 
> For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> completed (so Xen has suspended the guest then later resumed it), it
> would be the guest responsibility to setup Xen infrastructure. As in
> retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> 
> For HVM guests, fast = 0, suspends the guests without the guest making
> any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> Afterwards the guest is resumed and continues as usual. No PV drivers -
> hence no need to re-establish Xen PV infrastructure.
> 

Wait, isn't this function about resuming a guest? I'm confused because
you talk about HV injecting S3 suspend. I guess you wrote the wrong
thing?

My guess is below, from the perspective of resuming a guest

  PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
  suspend. So when toolstack uses fast=0, the guest resumes from the
  hypercall with return code unmodified. Guest then re-setup Xen
  infrastructure.

  HVM guest would have used S3 suspend to suspend itself. So when
  toolstack uses fast=0 case, hypervisor injects S3 resume and guest
  would just take the normal path like a real machine does.

Does that make sense?

Wei.

> Hope this helps.
> > 
> > Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
  - From: Ian Campbell

References:
- [Xen-devel] [PATCH v8 00/13] Prerequisite patches for COLO
  - From: Wen Congyang
- [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
  - From: Wen Congyang
- Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
  - From: Wei Liu
- Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [PATCH 1/9] x86/boot: enumerate documentation for the x86 hardware_subarch
Next by Date: Re: [Xen-devel] [PATCH] xen/memguard: Drop memguard_init() entirely
Previous by thread: Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
Next by thread: Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.