[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [xen master] x86/hvm: Improve hvm_set_guest_pat() code generation again
commit 715b92ba30f792e326bdd37b5a4969da9c5d4a6c Author: Edwin Török <edvin.torok@xxxxxxxxxx> AuthorDate: Mon May 16 20:45:13 2022 +0100 Commit: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> CommitDate: Fri Mar 24 12:16:31 2023 +0000 x86/hvm: Improve hvm_set_guest_pat() code generation again Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat() code generation"), and the discovery that Clang/LLVM makes some especially disastrous code generation for the loop at -O2 https://github.com/llvm/llvm-project/issues/54644 Edvin decided to remove the loop entirely by fully vectorising it. This is substantially more efficient than the loop, and rather harder for a typical compiler to mess up. Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Acked-by: Jan Beulich <jbeulich@xxxxxxxx> --- xen/arch/x86/hvm/hvm.c | 51 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0c81e2afc7..7342408233 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -299,24 +299,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat) *guest_pat = v->arch.hvm.pat_cr; } -int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat) +/* + * MSR_PAT takes 8 uniform fields, each of which must be a valid architectural + * memory type (0-1, 4-7). This is a fully vectorised form of the 8-iteration + * loop over bytes looking for X86_MT_* constants. + */ +static bool pat_valid(uint64_t val) { - unsigned int i; - uint64_t tmp; + /* Yields a non-zero value in any lane which had value greater than 7. */ + uint64_t any_gt_7 = val & 0xf8f8f8f8f8f8f8f8ull; - for ( i = 0, tmp = guest_pat; i < 8; i++, tmp >>= 8 ) - switch ( tmp & 0xff ) - { - case X86_MT_UCM: - case X86_MT_UC: - case X86_MT_WB: - case X86_MT_WC: - case X86_MT_WP: - case X86_MT_WT: - break; - default: - return 0; - } + /* + * With the > 7 case covered, identify lanes with the value 0-3 by finding + * lanes with bit 2 clear. + * + * Yields bit 2 set in each lane which has a value <= 3. + */ + uint64_t any_le_3 = ~val & 0x0404040404040404ull; + + /* + * Logically, any_2_or_3 is "any_le_3 && bit 1 set". + * + * We could calculate any_gt_1 as val & 0x02 and resolve the two vectors + * of booleans (shift one of them until the mask lines up, then bitwise + * and), but that is unnecessary calculation. + * + * Shift any_le_3 so it becomes bit 1 in each lane which has a value <= 3, + * and look for bit 1 in a subset of lanes. + */ + uint64_t any_2_or_3 = val & (any_le_3 >> 1); + + return !(any_gt_7 | any_2_or_3); +} + +int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat) +{ + if ( !pat_valid(guest_pat) ) + return 0; if ( !alternative_call(hvm_funcs.set_guest_pat, v, guest_pat) ) v->arch.hvm.pat_cr = guest_pat; -- generated by git-patchbot for /home/xen/git/xen.git#master
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |