[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> On 01.03.18 at 17:58, <andrew.cooper3@xxxxxxxxxx> wrote: > On 01/03/18 10:54, Jan Beulich wrote: >>>>> On 01.03.18 at 11:36, <roger.pau@xxxxxxxxxx> wrote: >>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: >>>>>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 02/28/18 7:20 PM >>> >>>>> On 28/02/18 16:22, Jan Beulich wrote: >>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@xxxxxxxxxx> wrote: >>>>>>> --- a/xen/include/asm-x86/alternative-asm.h >>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h >>>>>>> @@ -1,6 +1,8 @@ >>>>>>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >>>>>>> #define _ASM_X86_ALTERNATIVE_ASM_H_ >>>>>>> >>>>>>> +#include <asm/nops.h> >>>>>>> + >>>>>>> #ifdef __ASSEMBLY__ >>>>>>> >>>>>>> /* >>>>>>> @@ -18,6 +20,14 @@ >>>>>>> .byte \pad_len >>>>>>> .endm >>>>>>> >>>>>>> +.macro mknops nr_bytes >>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE >>>>>>> + .nop \nr_bytes, ASM_NOP_MAX >>>>>> And do you really need to specify ASM_NOP_MAX here? What's >>>>>> wrong with letting the assembler pick what it wants as the longest >>>>>> NOP? >>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops, >>>>> because of the associated decode stall on almost all hardware. Using >>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result >>>>> consistent, irrespective of toolchain support. >>>> I don't understand - an earlier patch takes care of runtime replacing them >>>> anyway. What stalls can then result? >>> The runtime replacement won't happen when using the .nops directive >>> AFAICT, because the original padding section will likely be filled >>> with opcodes different than 0x90, and thus the runtime nop >>> optimization won't be performed. >> Oh, indeed. That puts under question the whole idea of using >> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer >> to withdraw my ack. >> >>> I also agree that using the default (not proving a second argument) >>> seems like a better solution. Why would the toolstack switch to >>> something that leads to worse performance? That would certainly be >>> considered a bug. >> Why? They may change it based on data available for newer / >> older / whatever hardware. Any build-time choice is going to be >> suboptimal somewhere, so I think we absolutely should not >> bypass runtime replacing these NOPs, the more that now we >> may have quite large sequences of them. > > The pont of having the toolchain put out optimised nops is to avoid the > need for us to patch the site at all. I.e. calling optimise_nops() on a > set of toolchain nops defeats the purpose in the overwhelming common > case of running on a system which prefers P6 nops. > > The problem of working out when to optimise is that, when we come to > apply an individual alternative, we don't know if we've already patched > this site before. Even the unoptimised algorithm has a corner case > which explodes, if there is a stream of 0x90's on the end of a > replacement e.g. in a imm or disp field. > > Put simply, we cannot determine, by peeking at the patchsite, whether it > has been patched or not (other than keeping a full copy of the origin > site as a reference). As soon as we chose to optimise the nops of the > origin site, we cannot determine anything at all. > > Thinking out loud, we could perhaps have a section containing one byte > per origin site, which we use to track whether we've already optimised > the padding bytes, and whether the contents have been replaced. This > would also add an extra long into struct altentry, but its all cold data > after boot. What about alternatively simply updating the struct alt_instr instances to describe the code _after_ a patch that was applied? That'll allow to always know how much padding there is. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |