[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl/xl: improve behaviour when guest fails to suspend itself



On Tue, 8 Feb 2011, Ian Campbell wrote:
> # HG changeset patch
> # User Ian Campbell <ian.campbell@xxxxxxxxxx>
> # Date 1297185819 0
> # Node ID d631a4996cbc69a7fa8489f28d4a3313db12e77a
> # Parent  a46b91cd8202726aecd9ddefd8e75faff48144d6
> libxl/xl: improve behaviour when guest fails to suspend itself.
> 
> The PV suspend protocol requires guest co-operating whereby the guest
> must respond to a suspend request written to the xenstore control node
> by clearing the node and then making a suspend hypercall.
> 
> Currently when a guest fails to do this libxl times out and returns
> a generic failure code to the caller.
> 
> In response to this failure xl attempts to resume the guest. However
> if the guest has not responded to the suspend request then the is no
> guarantee that the guest has made the suspend hypercall (in fact it is
> quite unlikely). Since the resume process attempts to modify the
> return value of the hypercall (to indicate a cancelled suspend) this
> results in the guest eax/rax register being corrupted!
> 
> To fix this change libxl to do the following:
>    * Wait for the guest to acknowledge the suspend request.
>      - on timeout cancel the suspend request.
>        - if cancellation is successful then return a new error code to
>          indicate that the guest is not responding.
>        - if the cancel does not succeed then we raced with the guest
>          which actually did acknowledge at the last minute, so
>          continue.
>    * Wait for the guest to suspend.
>      - on timeout return the standard error code as before
>    * Guest successfully suspended, return success.
> 
> Lastly in xl do not attempt to resume a guest if it has not responded
> to the suspend request.
> 
> Tested by live migration of PVops kernels which either ignore the
> suspend request, have already crashed and those which suspend/resume
> correctly. In the first two cases the source domain is left alone (and
> continues to function in the first case) and in the third the
> migration is successful.
> 
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> 


Acked-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.