[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 7 Sep 2018 17:17:40 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 07 Sep 2018 16:17:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 17/08/18 10:38, Jan Beulich wrote:
>>>> On 17.08.18 at 10:59, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 17/08/2018 08:21, Jan Beulich wrote:
>>> --- a/xen/include/asm-x86/asm_defns.h
>>> +++ b/xen/include/asm-x86/asm_defns.h
>>> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>>>          UNLIKELY_END_SECTION "\n"          \
>>>          ".Llikely." #tag ".%=:"
>>>  
>>> +#define LINKONCE_PROLOGUE(sym)                    \
>>> +        ".ifndef " sym() "\n\t"                   \
>>> +        ".pushsection " sym(.gnu.linkonce.t.) "," \
>> This definitely warrants a comment and a change of name, seeing as sym
>> isn't a symbol.  Its a macro which gives you a string back.
> It's still a symbol name; don't forget that in the end section names
> are symbols names too. I'd have gone with a better name if I had
> any better idea. It's also not really clear to me what you want a
> comment to say here.

/* sym must be a macro which takes an optional symbol name prefix */ ?

A user of this macro, knowing only the prototype would expect to use by
passing a string literal.

>
>>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>>>  
>>>          va += xen_phys_start - XEN_VIRT_START;
>>>      }
>>> -    return (va & ma_va_bottom_mask) |
>>> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
>>> +
>>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
>>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
>>> +    alternative_io("call " SYMNAME() "\n\t"
>>> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
>>> +                   "mov %[shift], %%ecx\n\t"
>>> +                   "mov %[off], %[ma]\n\t"
>>> +                   "and %[bmask], %[ma]\n\t"
>>> +                   "shl %%cl, %[off]\n\t"
>>> +                   "and %[tmask], %[off]\n\t"
>>> +                   "or %[off], %[ma]\n\t"
>>> +                   "ret\n\t"
>>> +                   LINKONCE_EPILOGUE(SYMNAME),
>>> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
>> The compiler understanding V doesn't imply that the assembler
>> understands pdep
> I've been considering this case, and in fact in an early version I had
> a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't
> really imagine V support to be backported to gcc pre-dating BMI2
> support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe).
> But yes, if you really think we need to cope with that, I could re-add
> this. Just let me know if you indeed think this is needed.

Didn't you say that you've got Retpoline backported to GCC 4.3 ?

Independently of my argument against using V below, I'm not overly
fussed especially as we are considering upping the requirements to
something a little more modern, so long as we accept that we may have to
retrofit a -DHAVE_AS_BMI2 after the fact.

>
>>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>>> +                   [mask] "m" (ma_real_mask),
>>> +                   [shift] "m" (pfn_pdx_hole_shift),
>>> +                   [bmask] "m" (ma_va_bottom_mask),
>>> +                   [tmask] "m" (ma_top_mask)
>>> +                   : "ecx");
>>> +#undef SYMNAME
>>> +#else
>>> +    alternative_io("call do2ma",
>>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>>> +                   ".long ma_real_mask - 4 - .",
>>> +                   X86_FEATURE_BMI2,
>>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>>> +#endif
>> This is a massive clobber list in a function you've forced always
>> inline, and I can't see it doing nice things to the callsites.  TBH,
>> this still feels over-complicated for what it wants to be.
>>
>> Why not implement one single function in assembly that doesn't have
>> usual C calling conventions and can clobber %ecx and one other, and use
>> that?
>>
>> It avoids the need for potentially 256 almost-identical copies of the
>> function in the linkonce section, and avoids having the multiple
>> implementations in C/asm, avoids the need for any logic derived from
>> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.
> Your response mixes things a bit too much for me to sort out what
> exactly you're concerned about: The massive clobber list exists only
> in the !CONFIG_INDIRECT_THUNK case. In that case though there
> aren't going to be up to 225 instances of the function. I'd be fine
> implementing the single one in assembly to reduce the clobber list,
> it was just to keep down assembly code size and also to have
> compiler generated code to compare against. I have to admit though
> that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
> case in the first place, so I also didn't see much reason to try to
> optimize it.
>
> For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
> avoid dictating register allocation to the compiler. Hence the solution
> with the (possibly many) function instances. Overall code size still
> decreases with this approach, and on modern hardware the entire
> region of the image in which they live will remain cold. (Additionally
> not affecting register allocation here has the benefit of making it
> far easier to compare pre/post generated code.)

How many passes through the hypervisor hit two or more of these functions?

I appreciate that for development, reducing the register perturbance can
be nice for diffing the resulting disassembly, but when it comes to
actually running the code, register renames at compile time are free,
whereas pulling multiple almost-identical copies of the stub into the
I-cache is not.

Implementing this as a single function looks to be simpler in terms of
the change, will compile to a smaller result, and will run faster.  It
seems to be a win-win-win overall.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.