[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote: > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote: > > > Before this patch: > > > 1. suspend > > > a. PVHVM and PV: we use the same way to suspend the guest (send the > > > suspend > > > request to the guest). If the guest doesn't support evtchn, the > > > xenstore > > > variant will be used, suspending the guest via XenBus control node. > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend > > > the guest > > > > > > 2. Resume: > > > a. fast path(fast=1) > > > Do not change the guest state. We call libxl__domain_resume(.., 1) > > > which > > > calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest. > > > PV: modify the return code to 1, and than call the domctl: > > > XEN_DOMCTL_resumedomain > > > PVHVM: same with PV > > > pure HVM: do nothing in modify_returncode, and than call the domctl: > > > XEN_DOMCTL_resumedomain > > > b. slow > > > Used when the guest's state have been changed. Will call > > > libxl__domain_resume(..., 0) to resume the guest. > > > PV: update start info, and reset all secondary CPU states. Than > > > call > > > the domctl: XEN_DOMCTL_resumedomain > > > PVHVM: can not be resumed. You will get the following error message: > > > "Cannot resume uncooperative HVM guests" > > > pure HVM: same with PVHVM > > > > > > After this patch: > > > 1. suspend > > > unchanged > > > > > > 2. Resume > > > a. fast path: > > > unchanged > > > b. slow > > > PV: unchanged > > > PVHVM: call XEN_DOMCTL_resumedomain to resume the guest. Because we > > > don't modify the return code, the PV driver will disconnect > > > and reconnect. > > > The guest ends up doing the XENMAPSPACE_shared_info > > > XENMEM_add_to_physmap hypercall and resetting all of its CPU > > > states to point to the shared_info(well except the ones past > > > 32). > > > That is the Linux kernel does that - regardless whether the > > > SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not. > > > Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest. > > > > > > Under COLO, we will update the guest's state(modify memory, cpu's > > > registers, > > > device status...). In this case, we cannot use the fast path to resume it. > > > Keep the return code 0, and use a slow path to resume the guest. While > > > resuming HVM using slow path is not supported currently, this patch is to > > > make the resume call to not fail. > > > > > > Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx> > > > Signed-off-by: Yang Hongyang <hongyang.yang@xxxxxxxxxxxx> > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > > > I proposed an alternative commit log in a previous reply: > > > > === > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path > > > > Previously it was not possible to resume PVHVM or pure HVM guest in slow > > path because libxc didn't support that. > > > > Using XEN_DOMCTL_resumedomain without modifying guest return code to > > resume a > > guest is considered to be always safe. Introduce a function to do that for > > (PV)HVM guests in slow path resume. > > > > This patch fixes a bug that denies (PV)HVM slow path resume. This will > > enable COLO to work properly: COLO requires HVM guest to start in the > > new context that has been set up by COLO, hence slow path resume is > > required. > > === > > > > Note that I fix one place in this version from "guest state" to "guest > > return code" in the second paragraph. And that sentence is a big big > > assumption that I don't know whether it is true or not -- > > reverse-engineer from comment before xc_domain_resume and what Linux > > does. > > > > But the more I think the more I'm not sure if I'm writing the right > > thing. I also can't judge what is the right behaviour on the Linux side. > > > > Konrad, can you fact-check the commit message a bit? And maybe you can > > help answer the following questions? > > > > 1. If we use fast=0 on PVHVM guest, will it work? > > Yes. > > 2. If we use fast=0 on HVM guest, will it work? > > Yes. > > > > > What is worse, when I say "work" I actually have no clear definition of > > it. There doesn't seem to be a defined state that the guest needs to be. > > For PVHVM guests, fast = 0, requires that the guest makes an hypercall > to SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has > completed (so Xen has suspended the guest then later resumed it), it > would be the guest responsibility to setup Xen infrastructure. As in > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc. > > For HVM guests, fast = 0, suspends the guests without the guest making > any hypercalls. It is in effect the hypervisor injecting an S3 suspend. > Afterwards the guest is resumed and continues as usual. No PV drivers - > hence no need to re-establish Xen PV infrastructure. > Wait, isn't this function about resuming a guest? I'm confused because you talk about HV injecting S3 suspend. I guess you wrote the wrong thing? My guess is below, from the perspective of resuming a guest PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to suspend. So when toolstack uses fast=0, the guest resumes from the hypercall with return code unmodified. Guest then re-setup Xen infrastructure. HVM guest would have used S3 suspend to suspend itself. So when toolstack uses fast=0 case, hypervisor injects S3 resume and guest would just take the normal path like a real machine does. Does that make sense? Wei. > Hope this helps. > > > > Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |