[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available



On 01/03/18 10:54, Jan Beulich wrote:
>>>> On 01.03.18 at 11:36, <roger.pau@xxxxxxxxxx> wrote:
>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 02/28/18 7:20 PM >>>
>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>> @@ -1,6 +1,8 @@
>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>  
>>>>>> +#include <asm/nops.h>
>>>>>> +
>>>>>>  #ifdef __ASSEMBLY__
>>>>>>  
>>>>>>  /*
>>>>>> @@ -18,6 +20,14 @@
>>>>>>      .byte \pad_len
>>>>>>  .endm
>>>>>>  
>>>>>> +.macro mknops nr_bytes
>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>> NOP?
>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>> because of the associated decode stall on almost all hardware.  Using
>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>> consistent, irrespective of toolchain support.
>>> I don't understand - an earlier patch takes care of runtime replacing them
>>> anyway. What stalls can then result?
>> The runtime replacement won't happen when using the .nops directive
>> AFAICT, because the original padding section will likely be filled
>> with opcodes different than 0x90, and thus the runtime nop
>> optimization won't be performed.
> Oh, indeed. That puts under question the whole idea of using
> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
> to withdraw my ack.
>
>> I also agree that using the default (not proving a second argument)
>> seems like a better solution. Why would the toolstack switch to
>> something that leads to worse performance? That would certainly be
>> considered a bug.
> Why? They may change it based on data available for newer /
> older / whatever hardware. Any build-time choice is going to be
> suboptimal somewhere, so I think we absolutely should not
> bypass runtime replacing these NOPs, the more that now we
> may have quite large sequences of them.

The pont of having the toolchain put out optimised nops is to avoid the
need for us to patch the site at all.  I.e. calling optimise_nops() on a
set of toolchain nops defeats the purpose in the overwhelming common
case of running on a system which prefers P6 nops.

The problem of working out when to optimise is that, when we come to
apply an individual alternative, we don't know if we've already patched
this site before.  Even the unoptimised algorithm has a corner case
which explodes, if there is a stream of 0x90's on the end of a
replacement e.g. in a imm or disp field.

Put simply, we cannot determine, by peeking at the patchsite, whether it
has been patched or not (other than keeping a full copy of the origin
site as a reference).  As soon as we chose to optimise the nops of the
origin site, we cannot determine anything at all.

Thinking out loud, we could perhaps have a section containing one byte
per origin site, which we use to track whether we've already optimised
the padding bytes, and whether the contents have been replaced.  This
would also add an extra long into struct altentry, but its all cold data
after boot.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.