[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

On December 15, 2015 2:16:29 PM PST, "Luis R. Rodriguez" 
<mcgrof@xxxxxxxxxxxxxxxx> wrote:
>From: "Luis R. Rodriguez" <mcgrof@xxxxxxxx>
>             A long time ago in a galaxy far,
>             far away...
>Konrad Rzeszutek Wilk posted patches which eventually got merged to
>with modularizing the IOMMUs we have on x86 [0]. This work was done due
>the complex relationship that exists on IOMMUs and the requirements on
>careful execution. The solution also provided a mechanism which
>unused IOMMUs during run-time.
>During review, even though the code was merged, hpa did note that we
>to encounter this type of problem "often enough that we should
>implement a
>generic facility for it" [1], hpa acknowledged that it obviously has to
>based on sections and even noted that perhaps we might be able in the
>future to
>automate its creation. He noted that the gPXE folks had done just this
>linker tables and suggested that "presumably we'd need a few different
>for init tables and so on, but this would make it a generic mechanism."
>The IOMMU code got merged and this was left on someone's mental
>I've had an itch to scratch recently to try to avoid issues which are
>if one does not jettison other code carefully due to the large
>complexity of
>implicit dependencies of certain code on x86 in particular with
>possible dead
>code on x86 due to paravirtualization, and the IOMMU jettison strategy
>out to be my favorite solution so far. I've taken on hpa's suggestions
>back in the day to review gPXE's solution to see if we could embrace it
>Linux for a generic section solution to help jettison code carefully.
>What this patch set does exactly:
>This RFC patch set attempts to add support for such a generic solution.
>In the end, it turns out that the best solution possible was the best
>both worlds: a combination of what Konrad had implemented in addition
>what Michael Brown had implemented on the gXPE front. The IOMMU
>enables simple semantic annotations for dependency relationships, this
>however requires a run time sort. The gPXE solution grants the option
>simply sort at build time. One of gPXE's solution primary goals however
>also to help avoid bit-rot on code that's possible from #ifdef'ery. The
>linker table solution enables developers to pick and choose what they
>need, with linker tables being the simplest solution. Contrary to gPXE
>which strives to force compilation of all linker table solutions we
>let developers pick *when* they want this as part of their solution.
>As can be seen from the suggested x86 init specific use of linker
>proposed you can also take advantage of both, linker sorting, optional
>compilation when needed (at developer's discretion), and even careful
>semantics annotation for dependency / relationship annotations.
>the x86 init solution here is heavily inspired by the IOMMU solution it
>diverges with strong semantics, and a new optional subarchitecture
>annotation. Sorting of init sequences is structure specific, as such
>each subsystem must defing their own solution unless semantics could
>be shared. I considered sharing semantics but in the end this proved
>pointless so this keeps things separate. A series of changes were made
>to the x86 init sequence in contrast to the IOMMU solution to be
>pedantic* on semantics, review of this changes can be studied on the
>table-init tree [2].
>Quick review of gPXE's solution and prospects on further changes:
>In my review from gPXE's solution it was not clear what hpa meant by
>gXPE folks having automated this process, they actually use linker
>all around, forcing compilation of *everything* and just do linking of
>enabled features at link time. You still need to build linker tables on
>your own. What I do see more potential for in the future is enabling to
>evolve stronger semantics over time, and this would also be subsystem
>This will be evident in this patch set on the x86 init use of linker
>I also see potential in strenghtening semantics for linker sorting, any
>these types of features however would impose requiring newer binutils.
>instance, gPXE's linker solution currently relies on SORT(), that
>defaults to
>SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two
>to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
>priority. Since one is in control of order-level numbers one can
>guarantee that this sort should work as intended, however binutils also
>now has
>a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
>SORT_BY_INIT_PRIORITY() was designed specifically for init_array
>though. Refer to the userspace mockup solution table-init git tree [2]
>6deba47ee1ad461e90 for more details on this.  One thing I can envision
>to help
>here further are prospects for future linker enhancements, these
>however must
>be carefully considered in light of possible requirements of newer
>and also of compatibility with other toolchains. For now we resort to
>the good
>'ol SORT() which even Linux has made use of for ages already. In that
>here, nothing new here. gPXE folks however did find some fun ICC
>issues, and have also developed fixes for them, these were obviously
>over but should be reviewed carefully. Lastly, I should point out that
>essentially what we're developing are different forms loose and strong
>semantics -- in the most complex form what we really are after are
>graphs. Mauro explained to me that for media driver they needed to
>build a
>feature graphs, IMHO that could be a next level of strong semantics we
>want to consider. For now though the combination of gPXE's linker-table
>solution based on linker order-level sort, and our old IOMMU init
>and its simple dependency map (and possible extensions, see
>should likely suffice as a light weight solution for where we need
>for at least on the x86 init path.
>Motivation and possible enhancements:
>To understand what made me tick to work on this feel free to read the
>code concerns with paravirtualization posts I've made, which also go
>and detail Xen's alternative entry point [4] [5]. That's not the end of
>possibilities to help address to possible "dead code" on Linux, I have
>ideas but I have a bit more work to do before publishing anything about
>If anything -- later work should likely supplement this solution
>Currently the earliest I was able to make use of boot_params on x86-64
>linker tables was after load_idt(), so I've decided to stick with the
>init call for x86 init sequences starting on
>However, if we're able to make use of boot_params earlier (help
>in particular just boot_params.hdr.hardware_subarch we should be able
>to make
>clearer and careful annotations on early x86-64 init code. Using the
>boot_params.hdr.hardware_subarch and requiring clear subarchitecture
>annotations on early init code should provide a *proactive* means to
>avoid issues
>such as the cr4 shadow oversight [6] which later caused crashes on Xen.
>latest similar type of issue is when KAsan was introduced, after it was
>integrated I suspected KAsan would probably break if enabled on Xen, I
>this in March 2015 [7] and Andrey confirmed this might be the case but
>since he
>wanted it enabled for allyesconfig and allmodconfig he could not think
>of a
>clean way to disable it or address support for it then [8]. During the
>development of this patch set I confirmed KAsan crashes Xen dom0 and
>should obviously crash Xen PV guests as well.  Using the same kernel
>still worked on KVM and bare-metal. Provided we had early access to
>boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could
>kasan_early_init() and friends not supported on Xen, however since we
>have a way to disable KAsan at run time at this time we wouldn't have
>any other
>option but to crash early. Because of this though we should perhaps
>just consider
>disabling KAsan for Xen configs and KAsan folks might just have to bite
>bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen
>enabled and if its easier to add support to disable KAsan at run time
>it is to add KAsan support for Xen one alternative might be to use
>tables to annotate this and disable KAsan at run time for Xen.
>One of the points of the use of the x86 linker table solution and
>semantics there in is to *force* developers to think about requirements
>on x86
>carefully. So for instance by requiring itemizing supported
>*early* on in development we should be able to avoid proactively issues
>such as
>the crash due to the cr4 shadow changes not added on the Xen entry
>path, and
>also missing the requirement to develop a solution for Kasan for Xen.
>proposed x86 linker table solution is not the first to use
>boot_params.hdr.hardware_subarch, its first use was actually on i386,
>and for
>lguest. We take advantage of it to avoid further extending pv_ops, and
>*actually* to see if we could simplify pv_ops over time further. This
>patch set
>also includes a renaming of paravirt_enabled() to paravirt_legacy() 
>An unexpected long term side goal which *might* be possible due to
>tables on x86 is to help unify the different C entry points for Linux
>I've taken a stab at it after these patches [9] but it fails on Xen,
>because the stack is not set up right due to the different calls /
>requirements. If this is desirable folks could take a look and perhaps
>on that front.
>Where to get code alternatively and testing:
>All of these patches are RFCs, if anything the only patches worth
>merging sooner rather than later might be "paravirt: rename
>to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After
>I can send those separately in patch form if agreeable. Since these are
>just RFCs I've based these patches on top of Linus' v4.4-rc5. If you
>want all
>patches in one file you can get them here [10], and a respective
>version which applies on top of next-20151215 here [11]. I also have
>git trees set up with branches for this code, one based on Linus' tree
>and another based on linux-next next-20151215 [13]. I'll see what
>zero-day bot testing finds out. I've just run time tested this on
>bare metal, KVM, and Xen dom0 so far.
>[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2
>[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2
>[2] https://github.com/mcgrof/table-init/
>[8] http://lkml.kernel.org/r/54F5B3DA.70203@xxxxxxxxxxx
>Luis R. Rodriguez (8):
>  paravirt: rename paravirt_enabled to paravirt_legacy
>  tables.h: add linker table support
>  x86/boot: add BIT() to boot/bitops.h
>  x86/init: add linker table support
>  x86/init: move ebda reservations into linker table
>  x86/init: use linker table for i386 early setup
>  x86/init: user linker table for ce4100 early setup
>  x86/init: use linker table for mid early setup
> Documentation/kbuild/makefiles.txt      |  19 ++
> arch/x86/Kconfig.debug                  |  47 ++++
> arch/x86/boot/bitops.h                  |   2 +
> arch/x86/boot/boot.h                    |   2 +-
> arch/x86/include/asm/bios_ebda.h        |   2 -
> arch/x86/include/asm/paravirt.h         |   4 +-
> arch/x86/include/asm/paravirt_types.h   |  11 +-
> arch/x86/include/asm/processor.h        |   2 +-
> arch/x86/include/asm/setup.h            |  12 -
> arch/x86/include/asm/x86_init.h         |   1 +
> arch/x86/include/asm/x86_init_fn.h      | 267 ++++++++++++++++++++
> arch/x86/kernel/Makefile                |   4 +-
> arch/x86/kernel/apm_32.c                |   2 +-
> arch/x86/kernel/asm-offsets.c           |   2 +-
> arch/x86/kernel/cpu/intel.c             |   2 +-
> arch/x86/kernel/cpu/microcode/core.c    |   2 +-
> arch/x86/kernel/dbg-tables/Makefile     |  18 ++
> arch/x86/kernel/dbg-tables/alpha.c      |  10 +
> arch/x86/kernel/dbg-tables/beta.c       |  18 ++
> arch/x86/kernel/dbg-tables/delta.c      |  10 +
> arch/x86/kernel/dbg-tables/gamma.c      |  18 ++
> arch/x86/kernel/dbg-tables/gamma.h      |   3 +
> arch/x86/kernel/{head.c => ebda.c}      |   6 +-
> arch/x86/kernel/head32.c                |  22 +-
> arch/x86/kernel/head64.c                |   4 +-
> arch/x86/kernel/init.c                  |  55 +++++
> arch/x86/kernel/kvm.c                   |   9 +-
> arch/x86/kernel/paravirt.c              |   2 +-
> arch/x86/kernel/sort-init.c             | 114 +++++++++
> arch/x86/kernel/tboot.c                 |   2 +-
> arch/x86/kernel/vmlinux.lds.S           |  25 ++
> arch/x86/lguest/boot.c                  |   2 +-
> arch/x86/mm/dump_pagetables.c           |   2 +-
> arch/x86/platform/ce4100/ce4100.c       |   4 +-
> arch/x86/platform/intel-mid/intel-mid.c |   4 +-
> arch/x86/tools/relocs.c                 |   1 +
> arch/x86/xen/enlighten.c                |   2 +-
> drivers/pnp/pnpbios/core.c              |   2 +-
>include/linux/tables.h                  | 421
> scripts/Makefile.build                  |   4 +-
> scripts/Makefile.clean                  |   1 +
> scripts/Makefile.lib                    |  12 +
> 42 files changed, 1091 insertions(+), 61 deletions(-)
> create mode 100644 arch/x86/include/asm/x86_init_fn.h
> create mode 100644 arch/x86/kernel/dbg-tables/Makefile
> create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
> create mode 100644 arch/x86/kernel/dbg-tables/beta.c
> create mode 100644 arch/x86/kernel/dbg-tables/delta.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
> rename arch/x86/kernel/{head.c => ebda.c} (94%)
> create mode 100644 arch/x86/kernel/init.c
> create mode 100644 arch/x86/kernel/sort-init.c
> create mode 100644 include/linux/tables.h

By the way, please refer to iPXE as gPXE is a dead project.
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.