[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.16] Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"
On Thu, Nov 25, 2021 at 10:52:12AM +0100, Jan Beulich wrote: > On 25.11.2021 10:24, Roger Pau Monné wrote: > > On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote: > >> On 24.11.2021 22:11, Andrew Cooper wrote: > >>> OSSTest has identified a 3rd regression caused by this change. Migration > >>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron > >>> 4133) > >>> fails with: > >>> > >>> xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, > >>> msr ffffffff (22 = Invalid argument): Internal error > >>> xc: error: Restore failed (22 = Invalid argument): Internal error > >>> > >>> which is a safety check to prevent resuming the guest when the CPUID data > >>> has > >>> been truncated. The problem is caused by shrinking of the max policies, > >>> which > >>> is an ABI that needs handling compatibly between different versions of > >>> Xen. > >> > >> Would you mind pointing me to the flight which has hit this problem? I > >> don't think I've seen any respective failure. I also don't think I've > >> seen any respective discussion on irc, so I really wonder where all > >> this is coming from all of the sudden. It's not like the change in > >> question would have gone in just yesterday. > > > > It's from a pair of newish boxes that Credativ and Ian where > > attempting to commission yesterday. Since the boxes are not yet in > > production Ian wasn't sure if the issue could be on the testing or > > hardware side, so emailed the details in private for us to provide > > some feedback on the issue. The error is at: > > > > http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html > > I see. Quite lucky timing then. Indeed, it was pure luck that we got this just yesterday. > > >>> Furthermore, shrinking of the default policies also breaks things in some > >>> cases, because certain cpuid= settings in a VM config file which used to > >>> have > >>> an effect will now be silently discarded. > >> > >> I'm afraid I don't see what you're talking about here. Could you give > >> an example? Is this about features the flags of which live in the > >> higher leaves, which would have got stripped from the default policies? > >> Which would mean the stripping really should happen on the max policies > >> only (where it may not have much of an effect). > > > > I think there are two separate issues, which I tried to clarify in my > > reply to this same patch. > > > > Options set using cpuid= with xl could now be rejected when in > > previous versions they were accepted just fine. That's because the > > shrinking to the policies can now cause find_leaf calls in > > xc_cpuid_xend_policy to fail, and thus the whole operation would > > fail. > > Okay, this could be addressed by merely dropping the calls from > calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely > agree they shouldn't have been put there in the first place. > Which would be quite the opposite of your initial proposal, where > you did drop them from calculate_{pv,hvm}_max_policy(). A guest > migrating in with a larger max leaf value should merely have that > max leaf value retained, but that ought to be possible without > dropping the shrinking from calculate_{pv,hvm}_max_policy(). I won't argue it's not possible to do that without dropping the shrink from calculate_{pv,hvm}_max_policy(), but given the point we are on the release we should consider the safest option, and IMO that's the revert of the shrinking from there in order to restore the previous behavior and have working migrations from 4.15 -> 4.16. We can discuss other likely better approaches to solve this issue after the release. > Even > leaving aside migration, I guess an explicit request for a large > max leaf value should be honored; those possibly many trailing > leaves then would simply all be blank. > > > There's another likely error that only affects callers of > > xc_cpuid_apply_policy that pass a featureset (so not the upstream > > toolstack), where some leaves of the featureset could now be ignored > > by the guest if the max leaves value doesn't cover them anymore. Note > > this was already an issue even before 540d911c2813, as applying the > > featureset doesn't check that the set feature leaves are below the max > > leaf. > > If this was an issue before the commit to be reverted, I take it > the revert isn't going to help it? I think the commit makes it more likely to hit the above scenario by shrinking max leaves. > In which case this information > is interesting, but not applicable as justification for the > revert? As said above, while the commit at hand is not introducing the issue with the featuresets, it makes it more likely by shrinking the max leaves, and IMO it's a regression from behavior in 4.15. Ie: options set on the featureset on 4.15 would be exposed, while the same options could be hidden in 4.16 because of the shrinking to the default domain policies, if the user happens to set an option that's on an empty trailing featureset with a tail of zeroed leaves. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |