[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
- To: "H. Peter Anvin" <hpa@xxxxxxxxx>
- From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
- Date: Sat, 5 Oct 2024 17:00:01 -0700
- Cc: Uros Bizjak <ubizjak@xxxxxxxxx>, Ard Biesheuvel <ardb@xxxxxxxxxx>, Ard Biesheuvel <ardb+git@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, Andy Lutomirski <luto@xxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Dennis Zhou <dennis@xxxxxxxxxx>, Tejun Heo <tj@xxxxxxxxxx>, Christoph Lameter <cl@xxxxxxxxx>, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>, Arnd Bergmann <arnd@xxxxxxxx>, Masahiro Yamada <masahiroy@xxxxxxxxxx>, Kees Cook <kees@xxxxxxxxxx>, Nathan Chancellor <nathan@xxxxxxxxxx>, Keith Packard <keithp@xxxxxxxxxx>, Justin Stitt <justinstitt@xxxxxxxxxx>, Josh Poimboeuf <jpoimboe@xxxxxxxxxx>, Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>, Namhyung Kim <namhyung@xxxxxxxxxx>, Jiri Olsa <jolsa@xxxxxxxxxx>, Ian Rogers <irogers@xxxxxxxxxx>, Adrian Hunter <adrian.hunter@xxxxxxxxx>, Kan Liang <kan.liang@xxxxxxxxxxxxxxx>, linux-doc@xxxxxxxxxxxxxxx, linux-pm@xxxxxxxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-efi@xxxxxxxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, linux-sparse@xxxxxxxxxxxxxxx, linux-kbuild@xxxxxxxxxxxxxxx, linux-perf-users@xxxxxxxxxxxxxxx, rust-for-linux@xxxxxxxxxxxxxxx, llvm@xxxxxxxxxxxxxxx
- Delivery-date: Sun, 06 Oct 2024 00:06:32 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> Sadly, that is not correct; neither gcc nor clang uses lea:
Looking around, this may be intentional. At least according to Agner,
several cores do better at "mov immediate" compared to "lea".
Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
"mov i,r". Haswell is 1:4).
Of course, Agner's tables are good, but not necessarily always the
whole story. There are other instruction tables on the internet (eg
uops.info) with possibly more info.
And in reality, I would expect it to be a complete non-issue with any
OoO engine and real code, because you are very seldom ALU limited
particularly when there aren't any data dependencies.
But a RIP-relative LEA does seem to put a *bit* more pressure on the
core resources, so the compilers are may be right to pick a "mov".
Linus
|