[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
On 19/05/14 10:37, Andrew Cooper wrote: > On 17/05/14 17:01, Jason Andryuk wrote: >> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend. >> However, nothing verifies the state before modify_returncode() modifies >> the domain's registers. This will crash guest processes or the kernel >> itself. >> >> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`. >> >> Signed-off-by: Jason Andryuk <andryuk@xxxxxxxx> > Hmm. > > There is no possible way whatsoever that migration can work if a PV > guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx > which the toolstack rewrites with a new MFN on resume. > > By default, there is no need for knowledge from the HVM guest for > migrate. XenServer is perfectly capable of migrating HVM VMs without PV > drivers. I suspect therefore that we never use cooperative resume. > > This cooperative resume which modifies guest register state therefore > imposes the same SHUTDOWN_suspend restriction on HVM guests as it does > for PV guests. As a result, your patch below is correct as a fallback > safety measure, and should be taken. > > However the caller of modify_returncode is also at fault for attempting > to resume an already-running domain. I think there needs to be a bugfix > there as well. I presume that some piece of code is assuming that > despite libxl-save-helper failing, xc_domain_safe() paused the guest, > which is clearly not true in this case. > > ~Andrew And here, I actually mean xc_domain_save() ~Andrew > >> --- >> >> This change stops xc_domain_resume from killing my domUs on a failed >> migration. I'm using a wrapper around libxl-save-helper which may fail >> before libxl-save-helper is invoked, so xc_domain_save has not been >> called. The idle Linux domU kernels would BUG coming out of >> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1. >> journald was also observed to segfault. >> >> As written, this code treats calling xc_domain_resume on a running >> domain as an error. Do we want it silently ignored? Output with this >> patch looks like: >> >> """ >> Migration failed, resuming at sender. >> xc: error: Domain not in suspended state: Internal error >> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for >> domain 92: Interrupted system call >> """ >> >> libxl__domain_resume prints errno, but it is stale for this case. >> xc_domain_resume_cooperative could swallow modify_returncode's error, >> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the >> libxl error message. >> >> --- >> tools/libxc/xc_resume.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c >> index 18b4818..9ec6a59 100644 >> --- a/tools/libxc/xc_resume.c >> +++ b/tools/libxc/xc_resume.c >> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t >> domid) >> return -1; >> } >> >> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) ) >> + { >> + ERROR("Domain not in suspended state"); >> + return 1; >> + } >> + >> if ( info.hvm ) >> { >> /* HVM guests without PV drivers have no return code to modify. */ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |