[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On 01/03/18 10:54, Jan Beulich wrote: >>>> On 01.03.18 at 11:36, <roger.pau@xxxxxxxxxx> wrote: >> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: >>>>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 02/28/18 7:20 PM >>> >>>> On 28/02/18 16:22, Jan Beulich wrote: >>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@xxxxxxxxxx> wrote: >>>>>> --- a/xen/include/asm-x86/alternative-asm.h >>>>>> +++ b/xen/include/asm-x86/alternative-asm.h >>>>>> @@ -1,6 +1,8 @@ >>>>>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >>>>>> #define _ASM_X86_ALTERNATIVE_ASM_H_ >>>>>> >>>>>> +#include <asm/nops.h> >>>>>> + >>>>>> #ifdef __ASSEMBLY__ >>>>>> >>>>>> /* >>>>>> @@ -18,6 +20,14 @@ >>>>>> .byte \pad_len >>>>>> .endm >>>>>> >>>>>> +.macro mknops nr_bytes >>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE >>>>>> + .nop \nr_bytes, ASM_NOP_MAX >>>>> And do you really need to specify ASM_NOP_MAX here? What's >>>>> wrong with letting the assembler pick what it wants as the longest >>>>> NOP? >>>> I don't want a toolchain change to cause us to go beyond 11 byte nops, >>>> because of the associated decode stall on almost all hardware. Using >>>> ASM_NOP_MAX seemed like the easiest way to keep the end result >>>> consistent, irrespective of toolchain support. >>> I don't understand - an earlier patch takes care of runtime replacing them >>> anyway. What stalls can then result? >> The runtime replacement won't happen when using the .nops directive >> AFAICT, because the original padding section will likely be filled >> with opcodes different than 0x90, and thus the runtime nop >> optimization won't be performed. > Oh, indeed. That puts under question the whole idea of using > .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer > to withdraw my ack. > >> I also agree that using the default (not proving a second argument) >> seems like a better solution. Why would the toolstack switch to >> something that leads to worse performance? That would certainly be >> considered a bug. > Why? They may change it based on data available for newer / > older / whatever hardware. Any build-time choice is going to be > suboptimal somewhere, so I think we absolutely should not > bypass runtime replacing these NOPs, the more that now we > may have quite large sequences of them. The pont of having the toolchain put out optimised nops is to avoid the need for us to patch the site at all. I.e. calling optimise_nops() on a set of toolchain nops defeats the purpose in the overwhelming common case of running on a system which prefers P6 nops. The problem of working out when to optimise is that, when we come to apply an individual alternative, we don't know if we've already patched this site before. Even the unoptimised algorithm has a corner case which explodes, if there is a stream of 0x90's on the end of a replacement e.g. in a imm or disp field. Put simply, we cannot determine, by peeking at the patchsite, whether it has been patched or not (other than keeping a full copy of the origin site as a reference). As soon as we chose to optimise the nops of the origin site, we cannot determine anything at all. Thinking out loud, we could perhaps have a section containing one byte per origin site, which we use to track whether we've already optimised the padding bytes, and whether the contents have been replaced. This would also add an extra long into struct altentry, but its all cold data after boot. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |