[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests



On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > Before this patch:
> > 1. suspend
> > a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
> >    request to the guest). If the guest doesn't support evtchn, the xenstore
> >    variant will be used, suspending the guest via XenBus control node.
> > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
> >    the guest
> > 
> > 2. Resume:
> > a. fast path(fast=1)
> >    Do not change the guest state. We call libxl__domain_resume(.., 1) which
> >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> >    PV:       modify the return code to 1, and than call the domctl:
> >              XEN_DOMCTL_resumedomain
> >    PVHVM:    same with PV
> >    pure HVM: do nothing in modify_returncode, and than call the domctl:
> >              XEN_DOMCTL_resumedomain
> > b. slow
> >    Used when the guest's state have been changed. Will call
> >    libxl__domain_resume(..., 0) to resume the guest.
> >    PV:       update start info, and reset all secondary CPU states. Than 
> > call
> >              the domctl: XEN_DOMCTL_resumedomain
> >    PVHVM:    can not be resumed. You will get the following error message:
> >                  "Cannot resume uncooperative HVM guests"
> >    pure HVM: same with PVHVM
> > 
> > After this patch:
> > 1. suspend
> >    unchanged
> > 
> > 2. Resume
> > a. fast path:
> >    unchanged
> > b. slow
> >    PV:       unchanged
> >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
> >              don't modify the return code, the PV driver will disconnect
> >              and reconnect.
> >              The guest ends up doing the XENMAPSPACE_shared_info
> >              XENMEM_add_to_physmap hypercall and resetting all of its CPU
> >              states to point to the shared_info(well except the ones past 
> > 32).
> >              That is the Linux kernel does that - regardless whether the
> >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > 
> > Under COLO, we will update the guest's state(modify memory, cpu's registers,
> > device status...). In this case, we cannot use the fast path to resume it.
> > Keep the return code 0, and use a slow path to resume the guest. While
> > resuming HVM using slow path is not supported currently, this patch is to
> > make the resume call to not fail.
> > 
> > Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
> > Signed-off-by: Yang Hongyang <hongyang.yang@xxxxxxxxxxxx>
> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> 
> I proposed an alternative commit log in a previous reply:
> 
> ===
> Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> 
> Previously it was not possible to resume PVHVM or pure HVM guest in slow
> path because libxc didn't support that.
> 
> Using XEN_DOMCTL_resumedomain without modifying guest return code  to resume a
> guest is considered to be always safe.  Introduce a function to do that for
> (PV)HVM guests in slow path resume.
> 
> This patch fixes a bug that denies (PV)HVM slow path resume.  This will
> enable COLO to work properly:  COLO requires HVM guest to start in the
> new context that has been set up by COLO, hence slow path resume is
> required.
> ===
> 
> Note that I fix one place in this version from "guest state" to "guest
> return code" in the second paragraph. And that sentence is a big big
> assumption that I don't know whether it is true or not --
> reverse-engineer from comment before xc_domain_resume and what Linux
> does.
> 
> But the more I think the more I'm not sure if I'm writing the right
> thing. I also can't judge what is the right behaviour on the Linux side.
> 
> Konrad, can you fact-check the commit message a bit? And maybe you can
> help answer the following questions?
> 
> 1. If we use fast=0 on PVHVM guest, will it work?

Yes.
> 2. If we use fast=0 on HVM guest, will it work?

Yes.

> 
> What is worse, when I say "work" I actually have no clear definition of
> it. There doesn't seem to be a defined state that the guest needs to be.

For PVHVM guests, fast = 0, requires that the guest makes an hypercall
to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
completed (so Xen has suspended the guest then later resumed it), it
would be the guest responsibility to setup Xen infrastructure. As in
retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.

For HVM guests, fast = 0, suspends the guests without the guest making
any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
Afterwards the guest is resumed and continues as usual. No PV drivers -
hence no need to re-establish Xen PV infrastructure.

Hope this helps.
> 
> Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.