[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.16] Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 25 Nov 2021 11:15:30 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LNpL+w2RpxEXzvDwYtJ7L7SGl0DONk0ZKpymlGx9Wns=; b=i6Qxy9jG6djJSu+G+dI/cu/9OHP54Cm0bCq6CUo7VaqX2CWxGjQCkhMzXXKdqSxpVyQE+mPBEbT8isVVzagQ2nSursIzPh99ogDLUEfxm7Iw1ezu7g0jBAy14bSuiwIJzhzDAiLvVvDW00QSZwuRB6E2OVZpXrtjbcrmLcXF/q/6XSmhn7019rkvZxwToUhVJtU38spcMkbkQv5yaiMEEkLH4xsm7+umogrSXOrtKBwKXLmgMLKzyhAAGvGYMzQSIEEyQ9TAdecD0b5ZcVEvhjb0D/kxLupKYE511AkRalxFNA/A4XsZnKAcMGgrr1TfdLMTsEVVHAiT5cJhbQWrfA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WOuA13Jr4paky7+7fUA6yKvSfSKFM9MQrVot3vU8XMSaCsr2r/hbAOBnjdBvsmPbcHeQYQF63iXmjPQqlTLYpYFXsG2pMwVQa9i7fpW5CBvm0svy3g/xDeBihRi32n5TS1Fdl6yo7H5eVJGjGKlfanPrcW5y0SY1JtOkaAEhzG0mc3vQfAFxAFIrrG99fBJuwxKcHEt4V1UtKFsW5vWO84QKlqmeUCf6N2OhRvcMTu/DcVpeFu/QbkWJaefaHihG9TPWSH+50J7EVbzftjyZxg/xtcFdES6zztk/3mQJmxZWJLoJ9p3Hu0/4YCvFm+NRQdnNNrrVbGoEsfcGFtYsNg==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 25 Nov 2021 10:15:51 +0000
  • Ironport-data: A9a23:Czk8UaBveOVj0BVW/87kw5YqxClBgxIJ4kV8jS/XYbTApGkm12RUy WAYXmyFOPaPamX2Lt1+YNu19h5Q7JDQyNJiQQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMo/u1Si6FatANl1ElvU2zbue6WLGs1hxZH1c+EX540087wYbVv6Yz6TSHK1LV0 T/Ni5W31G+Ng1aY5UpNtspvADs21BjDkGtwUm4WPJinj3eH/5UhN7oNJLnZEpfNatI88thW5 Qr05OrREmvxp3/BAz4++1rxWhVirrX6ZWBihpfKMkQLb9crSiEai84G2PQghUh/yBeOuop/9 NR3uLOgUzwNFfHNnctFXEwNe81+FfUuFL7vJHG+tYqYzlHccmuqyPJrZK00FdRGoKAtWzgIr KFGbmBWBvyAr7veLLaTUO5ji95lNMD2FIgepmth3XfSCvNOrZXrHvWXu4ECh2xYasZmJdCAR fg/dRRVYhHpZQ0QNgcvOq5lg7L97pX4W2IB8w/EzUYt2EDS0w5ZwLXrKMDSeNGBWYNShEnwj nLL+SH1Dw8XMPSbyCGZ6TS8i+nXhyT5VYkOUrqi+ZZCglee22gSAx0+TkagrL+yjUvWc81bA 1wZ/Gwpt6dayaCwZoCjBVvi+ifC50NCHYoLewEn1O2T4un25CPDCFQFdDNueMYdkORqRDkFj lDcyrsFGgdTmLGSTHuc8JKdojWzJTUZIAc+WMMUcecWy4K9+d9u13ojWv4mSffo1YOtRVkc1 hjT9HBm74j/m/LnwElSEbrvpzu37qbEQQcujuk8djL0t1gpDGJJimHB1LQ60RqiBNrGJrVil CJd8yR70AzpJcvQ/BFhuM1XQNmUCw+taVUwe2JHEZg77CiK8HW+Z41W6zwWDB43aZlbJm65M B6J51I5CHpv0JyCN/EfXm5MI55ykfiI+SrNCpg4keaikrAuLVTarUmClGab3nz3kVhErE3ME czzTCpYNl5DUf4P5GPvH481iOZ3rghjlTK7bc2qlHyPjOvBDEN5vJ9YaTNimMhit/jayOgUm v4CX/a3J+J3DLejP3KJqNFLdjjn7xETXPjLliCeTcbaSiJOE2A9Ef7Bh7Qnfo1uhaNOkenUu Hq6XydlJJDX3BUr8C2GNSJubq3BR5F6oS5pNCAgJw/wiXMifZyu/OEUcJ5uJesr8+lqzPhVS fgZeprfXqQTG2qfozlNP4PgqIFCdQiwgV7cNSSSfzViLYVrQBbE+4G4c1K3pjUOFCe+qeA3v 6akilHAWZMGSgk7VJTWZfujwkmfp38YnO4uDULELsMKIBfn8ZRwKjy3hfgyepleJRLGzzqc9 gCXHRZH+rWd/95rqIHE3PnWoZ2oHu1yGlthM1PatbvmZzPH+meDwJNbVLradz7qS26pqr6pY v9Yzq+gPaRfzkpKqYd1D51i0bk6u4n0v7ZfwwlpQCfLYlCsBu8yK3WKx5AS5KhEx7sfsgqqQ EOfvNJdPOzRas/iFVcQIisjb/iCiq5IymWDs6xtLRWo/jJz8ZqGTV5WbkuFhyFqJbdoNJ8on LU6s8kM5g3j0hcnP75qVMyPG7hg+pDYb5gaiw==
  • Ironport-hdrordr: A9a23:e7Nz3694lOCzdRNbvTtuk+FCdb1zdoMgy1knxilNoENuHfBwxv rDoB1E73LJYVYqOU3Jmbi7Sc69qFfnhORICO4qTMqftWjdyRCVxeRZg7cKrAeQeREWmtQtsJ uINpIOdOEYbmIK/PoSgjPIaurIqePvmMvD5Za8vgdQpENRGtldBm9Ce3im+yZNNW977PQCZf 6hDp0tnUveRZ1bVLXwOlA1G8z44/HbnpPvZhALQzYh9Qm1lDutrJr3CQKR0BsyWy5Ghe5Kyx mIryXJooGY992rwB7V0GHeq7xQhdva09NGQOiBkNIcJDnAghuhIK5hR7qBljYop/zH0idmrP D85zMbe+hj4XLYeW+45TPrxgnbyT4rr0TvzFeJ6EGT6PDRdXYfMY5slIhZehzW5w4Lp9dnyp 9G2Gqfqt5+EQ7AtD6V3amIazha0m6P5VYym+8aiHJSFaEEbqVKkIAZ9ERJVL8dASPB7pw9Gu UGNrCT2B9vSyLYU5nlhBgs/DT1NU5DWytuA3Jy9fB96gIm3EyQlCAjtYgidnRpzuNKd3AL3Z WCDk1SrsA9ciYhV9MLOA4we7rFNoXze2O4DIuzGyWuKEhVAQOHl3bIiI9FkN1CPqZ4iqcPpA ==
  • Ironport-sdr: 7Gh0ZZqNqWn87HreQ9X7CS8oaTckiLM/YhwygK6nm6mWncNqkk0ksIyrTXzXA6jPRc+uCugwJU 2+5K1jvlcH85M65sgkdNy5HBdVRpwGyaU/dTx8djelMu3uLOE2iEi7pcEjhPT/lAnIPf6IP0K5 PIbNY4IECufqksHZOtZ09smmWFhmuVOdft5aRCyIALMRAp06wU4GV4AQbBxwPQIYY5oBYTZ4CF rfl7OfPH4GhoKn/+cAjPOp7XlhAyR98lf5mafCoMIkNce2CAiD87462P+RGwaOjx5oPTz2n3N/ INvrj6jl0R5pEskzppS86wWJ
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Nov 25, 2021 at 10:52:12AM +0100, Jan Beulich wrote:
> On 25.11.2021 10:24, Roger Pau Monné wrote:
> > On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote:
> >> On 24.11.2021 22:11, Andrew Cooper wrote:
> >>> OSSTest has identified a 3rd regression caused by this change.  Migration
> >>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 
> >>> 4133)
> >>> fails with:
> >>>
> >>>   xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, 
> >>> msr ffffffff (22 = Invalid argument): Internal error
> >>>   xc: error: Restore failed (22 = Invalid argument): Internal error
> >>>
> >>> which is a safety check to prevent resuming the guest when the CPUID data 
> >>> has
> >>> been truncated.  The problem is caused by shrinking of the max policies, 
> >>> which
> >>> is an ABI that needs handling compatibly between different versions of 
> >>> Xen.
> >>
> >> Would you mind pointing me to the flight which has hit this problem? I
> >> don't think I've seen any respective failure. I also don't think I've
> >> seen any respective discussion on irc, so I really wonder where all
> >> this is coming from all of the sudden. It's not like the change in
> >> question would have gone in just yesterday.
> > 
> > It's from a pair of newish boxes that Credativ and Ian where
> > attempting to commission yesterday. Since the boxes are not yet in
> > production Ian wasn't sure if the issue could be on the testing or
> > hardware side, so emailed the details in private for us to provide
> > some feedback on the issue. The error is at:
> > 
> > http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html
> 
> I see. Quite lucky timing then.

Indeed, it was pure luck that we got this just yesterday.

> 
> >>> Furthermore, shrinking of the default policies also breaks things in some
> >>> cases, because certain cpuid= settings in a VM config file which used to 
> >>> have
> >>> an effect will now be silently discarded.
> >>
> >> I'm afraid I don't see what you're talking about here. Could you give
> >> an example? Is this about features the flags of which live in the
> >> higher leaves, which would have got stripped from the default policies?
> >> Which would mean the stripping really should happen on the max policies
> >> only (where it may not have much of an effect).
> > 
> > I think there are two separate issues, which I tried to clarify in my
> > reply to this same patch.
> > 
> > Options set using cpuid= with xl could now be rejected when in
> > previous versions they were accepted just fine. That's because the
> > shrinking to the policies can now cause find_leaf calls in
> > xc_cpuid_xend_policy to fail, and thus the whole operation would
> > fail.
> 
> Okay, this could be addressed by merely dropping the calls from
> calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely
> agree they shouldn't have been put there in the first place.
> Which would be quite the opposite of your initial proposal, where
> you did drop them from calculate_{pv,hvm}_max_policy(). A guest
> migrating in with a larger max leaf value should merely have that
> max leaf value retained, but that ought to be possible without
> dropping the shrinking from calculate_{pv,hvm}_max_policy().

I won't argue it's not possible to do that without dropping the shrink
from calculate_{pv,hvm}_max_policy(), but given the point we are on
the release we should consider the safest option, and IMO that's the
revert of the shrinking from there in order to restore the previous
behavior and have working migrations from 4.15 -> 4.16.

We can discuss other likely better approaches to solve this issue
after the release.

> Even
> leaving aside migration, I guess an explicit request for a large
> max leaf value should be honored; those possibly many trailing
> leaves then would simply all be blank.
> 
> > There's another likely error that only affects callers of
> > xc_cpuid_apply_policy that pass a featureset (so not the upstream
> > toolstack), where some leaves of the featureset could now be ignored
> > by the guest if the max leaves value doesn't cover them anymore. Note
> > this was already an issue even before 540d911c2813, as applying the
> > featureset doesn't check that the set feature leaves are below the max
> > leaf.
> 
> If this was an issue before the commit to be reverted, I take it
> the revert isn't going to help it?

I think the commit makes it more likely to hit the above scenario by
shrinking max leaves.

> In which case this information
> is interesting, but not applicable as justification for the
> revert?

As said above, while the commit at hand is not introducing the issue
with the featuresets, it makes it more likely by shrinking the max
leaves, and IMO it's a regression from behavior in 4.15.

Ie: options set on the featureset on 4.15 would be exposed, while the
same options could be hidden in 4.16 because of the shrinking to the
default domain policies, if the user happens to set an option that's
on an empty trailing featureset with a tail of zeroed leaves.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.