[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 08/11] x86/shadow: reduce effort of hash calculation


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 9 Jan 2023 10:48:38 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MMHx9wpBRn012eK/sOZhbqy2uXhf9JMI/fRvQnuW4bg=; b=oaQT+O9CrhpCUOg/7Yye2TJAW7UWoSmnsVcRJUpAvYNv2Om+J0p/x1xqgzzzZACFvMAHK2vv1VmFqAgHhZ1co2TRiHAmyb6BIusrvbyjyC5mrFuUEJ/r1O5mQVxfTaMADOAis9HreuZuZExh7Im3wQkBSjhRWV14ZMIhaeEzYTriUF4dLq7ZZFmskVCvPa5mMHq5QRoIYAi/wUI06+e51QeVuzIPL6wx+fcy4aLBwxjZ7TkHvuJf5Zq0jmmwJOxPMDNi+cirHVWslN+V1nPmx6bb++M2aRI+TewDJ2vdq4mcGSaKb35ECXXDentMXbaxNqfJJId1URtGYs9vnzJSkg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xo64tgn/sivb8DgDWuQ1gopVmjLgmtIIXqD3ASdtyzsseqUiemf9H4taxs16wpK08In6h7Dg1rgF0TvFEMIlEVjr8VO89W84Cwe7bgQza3KHVVvge5bzlPIZXffX4+J/kf7LNUbwRHh80gaOcfkOcnrJBPQXsL2NTNVOWMjuux1xSWS1MIMdwW0Cy9lWzhDNigeyjS6ql+Q0Sa0QamEumVUfiBmk4UZeEVc+7JB9rMalVxXLPpTjxRCxrIXrc0aLofoxQsfVjI2iyDt+Xauq0amNFwzso3c4HKgSMi3i1eehCrPz8lzIMtHaUXWirjZvX7yonW3IjrdQ6HFwBjQ59Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, "Tim (Xen.org)" <tim@xxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 09 Jan 2023 09:48:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.01.2023 03:03, Andrew Cooper wrote:
> On 05/01/2023 4:05 pm, Jan Beulich wrote:
>> The "n" input is a GFN value and hence bounded by the physical address
>> bits in use on a system.
> 
> The one case where this isn't obviously true is in sh_audit().  It comes
> from a real MFN in the system, not a GFN, which will have the same
> property WRT PADDR_BITS.

I'm afraid I was more wrong with that than just for the audit case. Only
FL1 shadows use GFNs. All other shadows us MFNs. I'll update the sentence.

>>  The hash quality won't improve by also
>> including the upper always-zero bits in the calculation. To keep things
>> as compile-time-constant as they were before, use PADDR_BITS (not
>> paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.
> 
> While this is all true, you'll get a much better improvement by not
> forcing 'n' onto the stack just to access it bytewise.  Right now, the
> loop looks like:
> 
> <shadow_hash_insert>:
>     48 83 ec 10                 sub    $0x10,%rsp
>     49 89 c9                    mov    %rcx,%r9
>     41 89 d0                    mov    %edx,%r8d
>     48 8d 44 24 08              lea    0x8(%rsp),%rax
>     48 8d 4c 24 10              lea    0x10(%rsp),%rcx
>     48 89 74 24 08              mov    %rsi,0x8(%rsp)
>     0f 1f 80 00 00 00 00        nopl   0x0(%rax)
> /-> 0f b6 10                    movzbl (%rax),%edx
> |   48 83 c0 01                 add    $0x1,%rax
> |   45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
> |   41 01 d0                    add    %edx,%r8d
> |   48 39 c1                    cmp    %rax,%rcx
> \-- 75 ea                       jne    ffff82d0402efda0
> <shadow_hash_insert+0x20>
> 
> 
> which doesn't even have a compile-time constant loop bound.  It's
> runtime calculated by the second lea constructing the upper pointer bound.
> 
> Given this further delta:
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 4a8bcec10fe8..902c749f2724 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -1397,13 +1397,12 @@ static unsigned int shadow_get_allocation(struct
> domain *d)
>  typedef u32 key_t;
>  static inline key_t sh_hash(unsigned long n, unsigned int t)
>  {
> -    unsigned char *p = (unsigned char *)&n;
>      key_t k = t;
>      int i;
>  
>      BUILD_BUG_ON(PADDR_BITS > BITS_PER_LONG + PAGE_SHIFT);
> -    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++ )
> -        k = p[i] + (k << 6) + (k << 16) - k;
> +    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++, n >>= 8 )
> +        k = (uint8_t)n + (k << 6) + (k << 16) - k;
>  
>      return k % SHADOW_HASH_BUCKETS;
>  }
> 
> the code gen becomes:
> 
> <shadow_hash_insert>:
>     41 89 d0                    mov    %edx,%r8d
>     49 89 c9                    mov    %rcx,%r9
>     b8 05 00 00 00              mov    $0x5,%eax
> /-> 45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
> |   40 0f b6 d6                 movzbl %sil,%edx
> |   48 c1 ee 08                 shr    $0x8,%rsi
> |   41 01 d0                    add    %edx,%r8d
> |   83 e8 01                    sub    $0x1,%eax
> \-- 75 e9                       jne    ffff82d0402efd8b
> <shadow_hash_insert+0xb>
> 
> with an actual constant loop bound, and not a memory operand in sight. 
> This form (even at 8 iterations) will easily execute faster than the
> stack-spilled form.

Oh, yes, good idea. Will adjust.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.