[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.16] Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"
On 25.11.2021 10:24, Roger Pau Monné wrote: > On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote: >> On 24.11.2021 22:11, Andrew Cooper wrote: >>> OSSTest has identified a 3rd regression caused by this change. Migration >>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 4133) >>> fails with: >>> >>> xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, >>> msr ffffffff (22 = Invalid argument): Internal error >>> xc: error: Restore failed (22 = Invalid argument): Internal error >>> >>> which is a safety check to prevent resuming the guest when the CPUID data >>> has >>> been truncated. The problem is caused by shrinking of the max policies, >>> which >>> is an ABI that needs handling compatibly between different versions of Xen. >> >> Would you mind pointing me to the flight which has hit this problem? I >> don't think I've seen any respective failure. I also don't think I've >> seen any respective discussion on irc, so I really wonder where all >> this is coming from all of the sudden. It's not like the change in >> question would have gone in just yesterday. > > It's from a pair of newish boxes that Credativ and Ian where > attempting to commission yesterday. Since the boxes are not yet in > production Ian wasn't sure if the issue could be on the testing or > hardware side, so emailed the details in private for us to provide > some feedback on the issue. The error is at: > > http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html I see. Quite lucky timing then. >>> Furthermore, shrinking of the default policies also breaks things in some >>> cases, because certain cpuid= settings in a VM config file which used to >>> have >>> an effect will now be silently discarded. >> >> I'm afraid I don't see what you're talking about here. Could you give >> an example? Is this about features the flags of which live in the >> higher leaves, which would have got stripped from the default policies? >> Which would mean the stripping really should happen on the max policies >> only (where it may not have much of an effect). > > I think there are two separate issues, which I tried to clarify in my > reply to this same patch. > > Options set using cpuid= with xl could now be rejected when in > previous versions they were accepted just fine. That's because the > shrinking to the policies can now cause find_leaf calls in > xc_cpuid_xend_policy to fail, and thus the whole operation would > fail. Okay, this could be addressed by merely dropping the calls from calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely agree they shouldn't have been put there in the first place. Which would be quite the opposite of your initial proposal, where you did drop them from calculate_{pv,hvm}_max_policy(). A guest migrating in with a larger max leaf value should merely have that max leaf value retained, but that ought to be possible without dropping the shrinking from calculate_{pv,hvm}_max_policy(). Even leaving aside migration, I guess an explicit request for a large max leaf value should be honored; those possibly many trailing leaves then would simply all be blank. > There's another likely error that only affects callers of > xc_cpuid_apply_policy that pass a featureset (so not the upstream > toolstack), where some leaves of the featureset could now be ignored > by the guest if the max leaves value doesn't cover them anymore. Note > this was already an issue even before 540d911c2813, as applying the > featureset doesn't check that the set feature leaves are below the max > leaf. If this was an issue before the commit to be reverted, I take it the revert isn't going to help it? In which case this information is interesting, but not applicable as justification for the revert? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |