[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] x86/hvm: Improve hvm_set_guest_pat() code generation again
- To: Jan Beulich <jbeulich@xxxxxxxx>, Edwin Torok <edvin.torok@xxxxxxxxxx>
- From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
- Date: Fri, 16 Dec 2022 20:53:49 +0000
- Accept-language: en-GB, en-US
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/EBsvEOI499PfXBw7JeJBv/KuSyaexr/+L/6wbzMUCk=; b=P+s0L3sAeuVVUH1xUkT3whorTfWS/tVdeo4s01jEoJKuEbKH25X4VFMxh4njUuym8bpi6qnE2WPhnPefoIAlO4YDKzWk6aFkm7yKjPVU39vVWx7y1jQtlYx/Ra3SyGjqywW/LquGj1n9CXmuniFkmSf+io7E9+S7TowzIhim9eTe8jkdZUnM2dwtcxWBkWliVZZDEcB4luLfT5ejHEPi8ABG3JwxL8U/MEryXExSSNhAvdDjg2D7/s7T0ZNWHIdMFU5i88sXyscvpg479LZ4Li32DZmZdupCburjbFK1PV10RBIs+6FTxPeS4uDYYVgdGWbHa3xY/6WabSn6lP50hQ==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VzsFmoTqNHzNXE+AmuyfL1xoMxYu3llHkUgegnOcFhdUzT/GtIsHNJ5Se5Cl7KFabMzUy8oLSPxFon2RlWXsfU+0abFM1lqKL9tarDv+SBkkCqqpBK5zMQ6JCFPTBwp06zg1kQ09hBJr3TnzRLtYy16zn+a24wJi+B6sNjVdpM2yIq3IVDqezh6S96QOz1NtxgUBCnD399hjTENmsxIk78KDYdjzZpXOC4nRIuBsIqHSU0ROqRvPq8HZZpuZpiE/HbNhC/Vu65VFG2Sxe+rMk+QzZY8UaYRvZ9x5lQ+LLJiHXbBBLSueuow3pTnSDdRYMkTnma2qRvlhN5Mzn7r0QQ==
- Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
- Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>
- Delivery-date: Fri, 16 Dec 2022 20:54:16 +0000
- Ironport-data: A9a23:BaJDZaODoa+qTvDvrR1klsFynXyQoLVcMsEvi/4bfWQNrUpxgTMOy mNJWGHXP/3ZZmX9eN0lbN++80gEscXSyN9jSAto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQA+KmU4YoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvzrRC9H5qyo4mpC5wJmPpingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0tpUGUIVz eBDFDkIMCusrvydz++JWOY506zPLOGzVG8ekldJ6GiBSNoDH9XESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PpxujCKpOBy+OGF3N79U9qGX8hK2G2fo XrL5T/RCRAGLt2PjzGC9xpAg8efzXOhBdNPTtVU8NYtjVnD6nQoBSEodgWbnfyQ0w2GZspmf hl8Fi0G6PJaGFaQZtv3UgC8oXWElgUBQNcWGOo/gCmSzoLE7gDfAXILJhZRZdpjuMIoSDgC0 l6Sg8ivFTFpqKeSS3+W6vGTtzzaESQYKG4qZCkaTBAE6d3uvIEyiB3USt9pVqWyi7XI9SrYx jmLqG0ygusVhMtSj6Gjpwmf2nSru4TDSRMz6kPPRGW54whlZYmjIYu19Vzc6vUGJ4GcJrWcg EU5dwGlxLhmJfmweOalGY3hwJnBCy65DQDh
- Ironport-hdrordr: A9a23:htlh/Kl78Rf8LfhvCKaG6Pxi6W7pDfN1iWdD5ihNYBxZY6Wkfp +V8cjzhCWftN9OYhodcIi7SdK9qXO1z+8X3WGIVY3SEDUOy1HYVr2KirGSjAEIeheOu9K1sJ 0NT0EQMqyWMbEXt6fHCUyDYq4dKbq8ge6VbIXlvhFQpGhRAskOgTuRSDzra3GeLzM2Z6bRYa Dsgvav0ADQHEj/AP7aOlA1G8z44/HbnpPvZhALQzQ97hOVsD+u4LnmVzCFwxY3SVp0sPcf2F mAtza8yrSosvm9xBOZ/XTU9Y5qlNzozcYGLNCQi/ISNi7nhm+TFcdcsvy5zXIISdOUmRIXee r30lAd1gNImjXsl1SO0F7QMs/boW8TAjHZuAelaDDY0LHErXoBerZ8bMRiA1rkAgMbza9BOO gg5RPni7NHSRzHhyjz/N7OSlVjkVe1u2MrlaoJg2VYSpZ2Us4YkWWxxjImLH4sJlON1GkcKp gmMOjMoPJNNV+KZXHQuWdihNSqQ3QoBx+DBkwPoNac3TRalG1wixJw/r1Uol4QsJYmD5VU7e XNNapl0LlIU88NdKp4QOMMW9G+BGDBSQ/FdGiSPVPkHqcaPG+lke+93JwloOWxPJAYxpo7n5 rMFFteqG4pYkrrTdaD2ZVamyq9N1lVnQ6dvv22y6IJyYEUHoCbQBFrYGpe4/eIsrEYHtDRXe q1NdZfH+LjRFGebLp04w==
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
- Thread-index: AQHYrL5RUAHVMa/wUEmdRSJMkDgnS62oK2YAgMmcG4A=
- Thread-topic: [PATCH] x86/hvm: Improve hvm_set_guest_pat() code generation again
On 10/08/2022 3:06 pm, Jan Beulich wrote:
> On 10.08.2022 15:36, Andrew Cooper wrote:
>> From: Edwin Török <edvin.torok@xxxxxxxxxx>
>>
>> Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat()
>> code generation"), and the discovery that Clang/LLVM makes some especially
>> disastrous code generation for the loop at -O2
>>
>> https://github.com/llvm/llvm-project/issues/54644
>>
>> Edvin decided to remove the loop entirely by fully vectorising it. This is
>> substantially more efficient than the loop, and rather harder for a typical
>> compiler to mess up.
>>
>> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> The main downside being that changing the code to fit in a new PAT
> type will now be harder.
When was the last PAT type change?
Trick question. Never, because PAT hasn't changed since it was
introduced 24 years ago in the Pentium III.
I really don't think we're in danger of needing to change this logic.
> I wonder in particular whether with that
> in mind it wouldn't be better to express the check not in terms of
> relations, but in terms of set / clear bits ("bits 3-7 clear AND
> (bit 2 set OR bit 1 clear)"). The code kind of does so already, but
> the variable names don't reflect that (and would hence need to
> change in such an event).
That would reduced clarity.
The bits being set or cleared are trivial for any developer, given the
particularly basic RHS expressions.
The constant names are what relate the bit patterns to the description
of the problem.
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -302,24 +302,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat)
>> *guest_pat = v->arch.hvm.pat_cr;
>> }
>>
>> -int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
>> +/*
>> + * MSR_PAT takes 8 uniform fields, each of which must be a valid
>> architectural
>> + * memory type (0, 1, 4-7). This is a fully vectorised form of the
>> + * 8-iteration loop over bytes looking for PAT_TYPE_* constants.
> While grep-ing for PAT_TYPE_ will hit this line, I think we want
> every individual type to also be found here when grep-ing for one.
> The actual values aren't going to change, but perhaps the beast
> way to do so would still be by way of BUILD_BUG_ON()s.
Why? What does that solve or improve?
"pat" is the thing people are going to be looking for if they're
actually trying to find this logic.
(And I bring this patch up specifically after reviewing Demi's series,
where PAT_TYPE_* changes to X86_MT_* but "pat" is still the useful
search term IMO.)
>
>> + */
>> +static bool pat_valid(uint64_t val)
>> {
>> - unsigned int i;
>> - uint64_t tmp;
>> + /* Yields a non-zero value in any lane which had value greater than 7.
>> */
>> + uint64_t any_gt_7 = val & 0xf8f8f8f8f8f8f8f8;
> This and the other constant want to gain UL suffixes.
Fixed.
> (While I'm
> open to be convinced otherwise on the earlier two points, this one
> I'm going to insist on. Yet in case it would end up being the only
> change in need of making, it could of course be done while
> committing.)
I've tweaked the grammar a bit, but I don't think the other changes
would be an improvement.
~Andrew
|