[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] Drop error return if size mismatch is due to xcr0 settings
On Fri, 10 Oct 2014 07:36:11 +0100 Jan Beulich <JBeulich@xxxxxxxx> wrote: > >>> On 09.10.14 at 18:20, <andrew.cooper3@xxxxxxxxxx> wrote: > > On 09/10/14 17:10, Don Koch wrote: > >> On Thu, 9 Oct 2014 16:56:49 +0100 > >> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > >>> Can you explain what the bug is and why this is an appropriate fix? > >>> > >>> What is happening here is that the migration stream is providing an > >>> xsave area larger than the size it should be based on the xcr0 sent with > >>> it. > >> The old 4.3 system is providing a maximum size xsave area. The 4.4 xen > >> is calculating a smaller area required for said xsave area. All this means > >> is that the overflow at the end is meaningless and can be ignored (i.e. > >> restoring it shouldn't hurt). If the data sent was smaller than what was > >> expected, i.e. something is missing, that would be an error. > >> > >> I consider leaving the check and warning message useful as it allows > >> some debugging info if there indeed was something wrong. I tested this > >> on an AMD migrating from 4.3 to 4.4 and checking various ymm registers > >> with no data lost. > > > > Right ok - given this info, the patch looks plausible, but these details > > must be in the patch description. > > > > Given this diagnosis, I think it is reasonable to not fail the hypercall > > if we detect this condition and confirm that all unexpectedly-extra > > bytes are 0. > > > > In the case that there is a non-zero byte in there, we must fail the > > hypercall to prevent VM data corruption. The warning can be dropped for > > the "fixing up from Xen 4.3" case, but a sentence of two comment in the > > code will certainly be needed as justification. > > That's not going to be sufficient: The error return (not so sure about > the warning) should be dropped only in exactly the one case where > a fixup is really needed, i.e. not blindly for any length larger than that > needed to match xcr0_accum, but no larger than that needed to cover > all features (and as you say also not for any contents, i.e. a zero > check must also be added to guard the bypass of the error return). I had suggested adding a check against the maximum size, albeit the type of check was implied. I'm currently testing a change where the area between xcr0-derived size and maximum is checked for zero values. > Additionally in the to be extended patch description it should be made > clear that only 4.3.0 is affected (and then again early 4.2.x appear > to suffer the same issue, so just saying 4.3 is in any event misleading); > in fact with that I wonder whether fixing this is really worthwhile - it > should go almost without saying that stable releases (of which by now > we have seen three) get applied. Same goes for mentioning AMD here: > I can't see anything AMD specific in what is being done. Since we're talking about live migration, I would have figured that people would have inferred that 4.3 implied any earlier rev. The reason I mention AMD is because that is where the problem showed up. AMD has an additional area that is not on Intel, Lightweight Profiling, which isn't used that often and, therefore, that area in the xsave buffer isn't used. It would probably happen on an Intel machine if someone disabled the 256-bit SSE. It seems to me that my proposed hack is no worse than the existing 4.3 to 4.3 migration code; i.e. we just copy it over regardless of whether it's used or not. But I do agree that extra checks are a good thing and am willing to put those in. Unless I hear otherwise, either with other suggestions or "it's not worth fixing," I'll update the patch to include the zero check and the size check against the maximum size per xcr0. Should I flag this as possible 4.5 fodder? > Jan Thanks, -d _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |