[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] Drop error return if size mismatch is due to xcr0 settings

On Fri, 10 Oct 2014 07:36:11 +0100
Jan Beulich <JBeulich@xxxxxxxx> wrote:

> >>> On 09.10.14 at 18:20, <andrew.cooper3@xxxxxxxxxx> wrote:
> > On 09/10/14 17:10, Don Koch wrote:
> >> On Thu, 9 Oct 2014 16:56:49 +0100
> >> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> >>> Can you explain what the bug is and why this is an appropriate fix?
> >>>
> >>> What is happening here is that the migration stream is providing an
> >>> xsave area larger than the size it should be based on the xcr0 sent with 
> >>> it.
> >> The old 4.3 system is providing a maximum size xsave area. The 4.4 xen
> >> is calculating a smaller area required for said xsave area. All this means
> >> is that the overflow at the end is meaningless and can be ignored (i.e.
> >> restoring it shouldn't hurt).  If the data sent was smaller than what was
> >> expected, i.e. something is missing, that would be an error.
> >>
> >> I consider leaving the check and warning message useful as it allows
> >> some debugging info if there indeed was something wrong. I tested this
> >> on an AMD migrating from 4.3 to 4.4 and checking various ymm registers
> >> with no data lost.
> > 
> > Right ok - given this info, the patch looks plausible, but these details
> > must be in the patch description.
> > 
> > Given this diagnosis, I think it is reasonable to not fail the hypercall
> > if we detect this condition and confirm that all unexpectedly-extra
> > bytes are 0.
> > 
> > In the case that there is a non-zero byte in there, we must fail the
> > hypercall to prevent VM data corruption.  The warning can be dropped for
> > the "fixing up from Xen 4.3" case, but a sentence of two comment in the
> > code will certainly be needed as justification.
> That's not going to be sufficient: The error return (not so sure about
> the warning) should be dropped only in exactly the one case where
> a fixup is really needed, i.e. not blindly for any length larger than that
> needed to match xcr0_accum, but no larger than that needed to cover
> all features (and as you say also not for any contents, i.e. a zero
> check must also be added to guard the bypass of the error return).

I had suggested adding a check against the maximum size, albeit the type
of check was implied.

I'm currently testing a change where the area between xcr0-derived size
and maximum is checked for zero values.

> Additionally in the to be extended patch description it should be made
> clear that only 4.3.0 is affected (and then again early 4.2.x appear
> to suffer the same issue, so just saying 4.3 is in any event misleading);
> in fact with that I wonder whether fixing this is really worthwhile - it
> should go almost without saying that stable releases (of which by now
> we have seen three) get applied. Same goes for mentioning AMD here:
> I can't see anything AMD specific in what is being done.

Since we're talking about live migration, I would have figured that
people would have inferred that 4.3 implied any earlier rev.

The reason I mention AMD is because that is where the problem showed
up. AMD has an additional area that is not on Intel, Lightweight
Profiling, which isn't used that often and, therefore, that area in the
xsave buffer isn't used. It would probably happen on an Intel machine
if someone disabled the 256-bit SSE.

It seems to me that my proposed hack is no worse than the existing
4.3 to 4.3 migration code; i.e. we just copy it over regardless of
whether it's used or not. But I do agree that extra checks are a good
thing and am willing to put those in.

Unless I hear otherwise, either with other suggestions or "it's not
worth fixing," I'll update the patch to include the zero check and the
size check against the maximum size per xcr0.

Should I flag this as possible 4.5 fodder?

> Jan


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.