[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.16] Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 25 Nov 2021 11:22:27 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Dh3Y2cWniyTxg12/W2QwMG2KilPLOccv7Le7RAz3i9I=; b=ngHBNvY8ZR2wl5fKi1xg9fpAnjfz07LXCw8BRSkwiGHVn+iarOLt8vHoKRd9qNJf139wi9g/tzbXhX9SyMSXBIQGARKqFzTVLUTavsySCaTutX37+pYb7Hap8FRWcsG4vxwLaoU+mBeyiyoFikFraNRoK9k/p2EjxNaQ7ULK9jN2iRacN7YfkfAHR43ZxrCXlPuRM9oOll6YABlsRpN1RGh7i1isvjEkAbMp7c6WFxEetaCb86dHzuCK+cfedE03di58F6i9K5StiowF5fnde58Viue4eExRDAiUWXHzgs9TT30H5fWqj2X5pY0g6J9XKtBmF9edMtnI0wJqhWkDaQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DOul+owUzO+2H9R5d35G/gzqJCKD7RsMzV5x3gLvwqPunF6DwXuBeopQFxU67UnsgpDHB4ZJA8dCaI7xaHPq25gEk4LCpJJy9O/3OmR2F1QYQxpt5rkvNV4C56P89WytBxoRzjTrbZgugbvPlpGbqa66Q2g539UVLeAq9czHbtBW4PKZnErUedh5BQ1WHrznGulHeQvzQ00tqFVsA/jqHhk0dKlHe2JNUb3L4wINAXvi7NPywTNZB4WQTEaWi3lz1Hy++EKXzQyiPcxpr9uINkwQMoZM+S/XI9IB2duuvKiihWPdwc/5O+wPKfzQQxDyV7+JT6BFmAXoMoE04sdfug==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 25 Nov 2021 10:22:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.11.2021 11:15, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 10:52:12AM +0100, Jan Beulich wrote:
>> On 25.11.2021 10:24, Roger Pau Monné wrote:
>>> On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote:
>>>> On 24.11.2021 22:11, Andrew Cooper wrote:
>>>>> OSSTest has identified a 3rd regression caused by this change.  Migration
>>>>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 
>>>>> 4133)
>>>>> fails with:
>>>>>
>>>>>   xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, 
>>>>> msr ffffffff (22 = Invalid argument): Internal error
>>>>>   xc: error: Restore failed (22 = Invalid argument): Internal error
>>>>>
>>>>> which is a safety check to prevent resuming the guest when the CPUID data 
>>>>> has
>>>>> been truncated.  The problem is caused by shrinking of the max policies, 
>>>>> which
>>>>> is an ABI that needs handling compatibly between different versions of 
>>>>> Xen.
>>>>
>>>> Would you mind pointing me to the flight which has hit this problem? I
>>>> don't think I've seen any respective failure. I also don't think I've
>>>> seen any respective discussion on irc, so I really wonder where all
>>>> this is coming from all of the sudden. It's not like the change in
>>>> question would have gone in just yesterday.
>>>
>>> It's from a pair of newish boxes that Credativ and Ian where
>>> attempting to commission yesterday. Since the boxes are not yet in
>>> production Ian wasn't sure if the issue could be on the testing or
>>> hardware side, so emailed the details in private for us to provide
>>> some feedback on the issue. The error is at:
>>>
>>> http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html
>>
>> I see. Quite lucky timing then.
> 
> Indeed, it was pure luck that we got this just yesterday.
> 
>>
>>>>> Furthermore, shrinking of the default policies also breaks things in some
>>>>> cases, because certain cpuid= settings in a VM config file which used to 
>>>>> have
>>>>> an effect will now be silently discarded.
>>>>
>>>> I'm afraid I don't see what you're talking about here. Could you give
>>>> an example? Is this about features the flags of which live in the
>>>> higher leaves, which would have got stripped from the default policies?
>>>> Which would mean the stripping really should happen on the max policies
>>>> only (where it may not have much of an effect).
>>>
>>> I think there are two separate issues, which I tried to clarify in my
>>> reply to this same patch.
>>>
>>> Options set using cpuid= with xl could now be rejected when in
>>> previous versions they were accepted just fine. That's because the
>>> shrinking to the policies can now cause find_leaf calls in
>>> xc_cpuid_xend_policy to fail, and thus the whole operation would
>>> fail.
>>
>> Okay, this could be addressed by merely dropping the calls from
>> calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely
>> agree they shouldn't have been put there in the first place.
>> Which would be quite the opposite of your initial proposal, where
>> you did drop them from calculate_{pv,hvm}_max_policy(). A guest
>> migrating in with a larger max leaf value should merely have that
>> max leaf value retained, but that ought to be possible without
>> dropping the shrinking from calculate_{pv,hvm}_max_policy().
> 
> I won't argue it's not possible to do that without dropping the shrink
> from calculate_{pv,hvm}_max_policy(), but given the point we are on
> the release we should consider the safest option, and IMO that's the
> revert of the shrinking from there in order to restore the previous
> behavior and have working migrations from 4.15 -> 4.16.
> 
> We can discuss other likely better approaches to solve this issue
> after the release.

Sure; as said earlier on I merely would like to understand things
sufficiently well before giving my ack.

Jan

>> Even
>> leaving aside migration, I guess an explicit request for a large
>> max leaf value should be honored; those possibly many trailing
>> leaves then would simply all be blank.
>>
>>> There's another likely error that only affects callers of
>>> xc_cpuid_apply_policy that pass a featureset (so not the upstream
>>> toolstack), where some leaves of the featureset could now be ignored
>>> by the guest if the max leaves value doesn't cover them anymore. Note
>>> this was already an issue even before 540d911c2813, as applying the
>>> featureset doesn't check that the set feature leaves are below the max
>>> leaf.
>>
>> If this was an issue before the commit to be reverted, I take it
>> the revert isn't going to help it?
> 
> I think the commit makes it more likely to hit the above scenario by
> shrinking max leaves.
> 
>> In which case this information
>> is interesting, but not applicable as justification for the
>> revert?
> 
> As said above, while the commit at hand is not introducing the issue
> with the featuresets, it makes it more likely by shrinking the max
> leaves, and IMO it's a regression from behavior in 4.15.
> 
> Ie: options set on the featureset on 4.15 would be exposed, while the
> same options could be hidden in 4.16 because of the shrinking to the
> default domain policies, if the user happens to set an option that's
> on an empty trailing featureset with a tail of zeroed leaves.
> 
> Thanks, Roger.
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.