[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers



On 19/05/14 13:07, Jason Andryuk wrote:
> On 5/19/2014 5:37 AM, Andrew Cooper wrote:
>> On 17/05/14 17:01, Jason Andryuk wrote:
>>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>>> However, nothing verifies the state before modify_returncode() modifies
>>> the domain's registers.  This will crash guest processes or the kernel
>>> itself.
>>>
>>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>>
>>> Signed-off-by: Jason Andryuk <andryuk@xxxxxxxx>
>> Hmm.
>>
>> There is no possible way whatsoever that migration can work if a PV
>> guest is not in SHUTDOWN_suspend.  PV guests have to leave an MFN in edx
>> which the toolstack rewrites with a new MFN on resume.
>>
>> By default, there is no need for knowledge from the HVM guest for
>> migrate.  XenServer is perfectly capable of migrating HVM VMs without PV
>> drivers.  I suspect therefore that we never use cooperative resume.
> I've only used 64-bit PV domUs, so I haven't really thought about HVM.  If 
> info.shutdown_reason == SHUTDOWN_suspend is expected for all HVM cases, then 
> the hunk can stand.  Otherwise it should be moved later after HVM without PV 
> drivers has exited.

I disagree.  It is unconditionally an error to be in this function with
a guest which is not in SHUTDOWN_suspend, even if there is a codepath
through the function which would not corrupt state.  I would leave the
hunk as it stands, although you might consider setting errno so EINVAL
(or something more appropriate).


>
>> This cooperative resume which modifies guest register state therefore
>> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
>> for PV guests.  As a result, your patch below is correct as a fallback
>> safety measure, and should be taken.
>>
>> However the caller of modify_returncode is also at fault for attempting
>> to resume an already-running domain.  I think there needs to be a bugfix
>> there as well.  I presume that some piece of code is assuming that
>> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
>> which is clearly not true in this case.
> Agreed.  modify_returncode was already making the call to xc_domain_info (and 
> doing the damage), so adding a check there was easy.
>
> The patch was posted RFC since I was looking for guidance on whether 
> xc_domain_resume on a running domain is an error or should it be treated as 
> success?  The original modify_returncode returns 0 on success or -1 on error. 
>  This patch returns 1 for the already running case.  This could be handled 
> differently by the caller to bypass XEN_DOMCTL_resumedomain without returning 
> an error.
>
> -Jason

I still think the real problem is higher up.  The toolstack absolutely
shouldn't running xc_domain_resume() on a running domain.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.