[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/PV: avoid indirect call for I/O emulation quirk hook


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 18 Jan 2024 11:11:55 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 18 Jan 2024 11:12:02 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18/01/2024 11:09 am, Jan Beulich wrote:
> On 18.01.2024 12:04, Andrew Cooper wrote:
>> On 17/01/2024 9:37 am, Jan Beulich wrote:
>>> --- a/xen/arch/x86/ioport_emulate.c
>>> +++ b/xen/arch/x86/ioport_emulate.c
>>> @@ -8,11 +8,10 @@
>>>  #include <xen/sched.h>
>>>  #include <xen/dmi.h>
>>>  
>>> -unsigned int (*__read_mostly ioemul_handle_quirk)(
>>> -    uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs);
>>> +bool __ro_after_init ioemul_handle_quirk;
>>>  
>>> -static unsigned int cf_check ioemul_handle_proliant_quirk(
>>> -    u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs)
>>> +unsigned int ioemul_handle_proliant_quirk(
>>> +    uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>>>  {
>>>      static const char stub[] = {
>>>          0x9c,       /*    pushf           */
>> Something occurred to me.
>>
>> diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
>> index 23cba842b22e..70f94febe255 100644
>> --- a/xen/arch/x86/ioport_emulate.c
>> +++ b/xen/arch/x86/ioport_emulate.c
>> @@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk;
>>  unsigned int ioemul_handle_proliant_quirk(
>>      uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>>  {
>> -    static const char stub[] = {
>> +    const char stub[] = {
>>          0x9c,       /*    pushf           */
>>          0xfa,       /*    cli             */
>>          0xee,       /*    out %al, %dx    */
>>
>> is an improvement, confirmed by bloat-o-meter:
>>
>> add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8)
>> Function                                     old     new   delta
>> ioemul_handle_proliant_quirk                  58      59      +1
>> stub                                           9       -      -9
>>
>> The reason is that we've got a 9 byte object that's decomposed into two
>> rip-relative accesses.  i.e. we've got more pointer than data in this case.
> I wouldn't mind this as a separate change, but I don't see how it would
> fit right here.

I'm not suggesting changing this patch.  I just linked here because I
noticed it because of this patch.

We've got similar patterns elsewhere, so I was intending to do a patch
covering all of them.

>
>> But this adjustment seems to tickle a GCC bug.  With that change in
>> place, GCC emits:
>>
>> <ioemul_handle_proliant_quirk>:
>>        48 83 ec 10             sub    $0x10,%rsp
>>        ...
>>        48 83 c4 10             add    $0x10,%rsp
>>        c3                      retq
>>
>> i.e. we get a stack frame (space at least, no initialisation) despite
>> the object having been converted entirely to instruction immediates.
>>
>> Or in other words, there's a further 12 byte saving available when GCC
>> can be persuaded to not even emit the stack frame.
>>
>> What is even more weird is that I see this GCC-10, and a build of gcc
>> master from last week, but not when I try to reproduce in
>> https://godbolt.org/z/PnachbznW so there's probably some other setting
>> used by Xen which tickles this bug.
> Yeah, I've observed such pointless frame allocation elsewhere as well,
> so far without being able what exactly triggers it.

Ok - more experimentation required, I guess.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.