[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] Drop error return if size mismatch is due to xcr0 settings



>>> On 09.10.14 at 18:20, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 09/10/14 17:10, Don Koch wrote:
>> On Thu, 9 Oct 2014 16:56:49 +0100
>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>> Can you explain what the bug is and why this is an appropriate fix?
>>>
>>> What is happening here is that the migration stream is providing an
>>> xsave area larger than the size it should be based on the xcr0 sent with it.
>> The old 4.3 system is providing a maximum size xsave area. The 4.4 xen
>> is calculating a smaller area required for said xsave area. All this means
>> is that the overflow at the end is meaningless and can be ignored (i.e.
>> restoring it shouldn't hurt).  If the data sent was smaller than what was
>> expected, i.e. something is missing, that would be an error.
>>
>> I consider leaving the check and warning message useful as it allows
>> some debugging info if there indeed was something wrong. I tested this
>> on an AMD migrating from 4.3 to 4.4 and checking various ymm registers
>> with no data lost.
> 
> Right ok - given this info, the patch looks plausible, but these details
> must be in the patch description.
> 
> Given this diagnosis, I think it is reasonable to not fail the hypercall
> if we detect this condition and confirm that all unexpectedly-extra
> bytes are 0.
> 
> In the case that there is a non-zero byte in there, we must fail the
> hypercall to prevent VM data corruption.  The warning can be dropped for
> the "fixing up from Xen 4.3" case, but a sentence of two comment in the
> code will certainly be needed as justification.

That's not going to be sufficient: The error return (not so sure about
the warning) should be dropped only in exactly the one case where
a fixup is really needed, i.e. not blindly for any length larger than that
needed to match xcr0_accum, but no larger than that needed to cover
all features (and as you say also not for any contents, i.e. a zero
check must also be added to guard the bypass of the error return).

Additionally in the to be extended patch description it should be made
clear that only 4.3.0 is affected (and then again early 4.2.x appear
to suffer the same issue, so just saying 4.3 is in any event misleading);
in fact with that I wonder whether fixing this is really worthwhile - it
should go almost without saying that stable releases (of which by now
we have seen three) get applied. Same goes for mentioning AMD here:
I can't see anything AMD specific in what is being done.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.