|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
On 17/05/14 17:01, Jason Andryuk wrote:
> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
> However, nothing verifies the state before modify_returncode() modifies
> the domain's registers. This will crash guest processes or the kernel
> itself.
>
> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>
> Signed-off-by: Jason Andryuk <andryuk@xxxxxxxx>
Hmm.
There is no possible way whatsoever that migration can work if a PV
guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
which the toolstack rewrites with a new MFN on resume.
By default, there is no need for knowledge from the HVM guest for
migrate. XenServer is perfectly capable of migrating HVM VMs without PV
drivers. I suspect therefore that we never use cooperative resume.
This cooperative resume which modifies guest register state therefore
imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
for PV guests. As a result, your patch below is correct as a fallback
safety measure, and should be taken.
However the caller of modify_returncode is also at fault for attempting
to resume an already-running domain. I think there needs to be a bugfix
there as well. I presume that some piece of code is assuming that
despite libxl-save-helper failing, xc_domain_safe() paused the guest,
which is clearly not true in this case.
~Andrew
> ---
>
> This change stops xc_domain_resume from killing my domUs on a failed
> migration. I'm using a wrapper around libxl-save-helper which may fail
> before libxl-save-helper is invoked, so xc_domain_save has not been
> called. The idle Linux domU kernels would BUG coming out of
> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
> journald was also observed to segfault.
>
> As written, this code treats calling xc_domain_resume on a running
> domain as an error. Do we want it silently ignored? Output with this
> patch looks like:
>
> """
> Migration failed, resuming at sender.
> xc: error: Domain not in suspended state: Internal error
> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for
> domain 92: Interrupted system call
> """
>
> libxl__domain_resume prints errno, but it is stale for this case.
> xc_domain_resume_cooperative could swallow modify_returncode's error,
> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
> libxl error message.
>
> ---
> tools/libxc/xc_resume.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 18b4818..9ec6a59 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t
> domid)
> return -1;
> }
>
> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
> + {
> + ERROR("Domain not in suspended state");
> + return 1;
> + }
> +
> if ( info.hvm )
> {
> /* HVM guests without PV drivers have no return code to modify. */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |