[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies



On Fri, Feb 19, 2016 at 4:42 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxx> wrote:
> On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote:
>> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
>> > Any failure on the x86 init path can be catastrophic.
>> > A simple shift of a call from one place to another can
>> > easily break things. Likewise adding a new call to
>> > one path without considering all x86 requirements
>> > can make certain x86 run time environments crash.
>> > We currently account for these requirements through
>> > peer code review and run time testing. We could do
>> > much better if we had a clean and simple way to annotate
>> > strong semantics for run time requirements, init sequence
>> > dependencies, and detection mechanisms for additions of
>> > new x86 init sequences.
>>
>> Please document this in a way that will be useful for people trying to
>> understand what the code does as opposed to just people who are trying
>> to understand why you wrote it.  More below.
>
> Sure. I'm kind of glad people are getting tired of me trying to
> explain *why* I wrote it though, I figured that might be the harder
> thing to explain. Hopefully it sinks in.
>
>> > +/**
>> > + * struct x86_init_fn - x86 generic kernel init call
>> > + *
>> > + * Linux x86 features vary in complexity, features may require work done 
>> > at
>> > + * different levels of the full x86 init sequence. Today there are also 
>> > two
>> > + * different possible entry points for Linux on x86, one for bare metal, 
>> > KVM
>> > + * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a 
>> > bootloader
>> > + * has set up 64-bit mode, roughly the x86 init sequence follows this 
>> > path:
>> > + *
>> > + * Bare metal, KVM, Xen HVM                      Xen PV / dom0
>> > + *       startup_64()                             startup_xen()
>> > + *              \                                     /
>> > + *      x86_64_start_kernel()                 xen_start_kernel()
>> > + *                           \               /
>> > + *                      x86_64_start_reservations()
>> > + *                                   |
>> > + *                              start_kernel()
>> > + *                              [   ...        ]
>> > + *                              [ setup_arch() ]
>> > + *                              [   ...        ]
>> > + *                                  init
>> > + *
>>
>>
>> I don't think this paragraph below is necessary.  I also think it's
>> very confusing.  Keep in mind that the reader has no idea what a
>> "level" is at this point and that the reader also doesn't need to
>> think about terms like "paravirtualization yielding".
>>
>> > + * x86_64_start_kernel() and xen_start_kernel() are the respective first 
>> > C code
>> > + * entry starting points. The different entry points exist to enable Xen 
>> > to
>> > + * skip a lot of hardware setup already done and managed on behalf of the
>> > + * hypervisor, we refer to this as "paravirtualization yielding". The 
>> > different
>> > + * levels of init calls on the x86 init sequence exist to account for 
>> > these
>> > + * slight differences and requirements. These different entry points also 
>> > share
>> > + * a common entry x86 specific path, x86_64_start_reservations().
>
> OK sure, I'll nuke.
>
>> > + *
>>
>> And here, I don't even know what a "feature" is.
>
> Kasan is an example.

Is kasan different from the cr4 shadow in this context?  I don't
really feel like calling the cr4 shadow a "feature" is meaningful.  Am
I misunderstanding something?

>
>> > + * A generic x86 feature can have different initialization calls, one on 
>> > each
>> > + * of the different main x86 init sequences, but must also address both 
>> > entry
>> > + * points in order to work properly across the board on all supported x86
>> > + * subarchitectures. Since x86 features can also have dependencies on 
>> > other
>> > + * setup code or features, x86 features can at times be subordinate to 
>> > other
>> > + * x86 features, or conditions. struct x86_init_fn enables feature 
>> > developers
>> > + * to annotate dependency relationships to ensure subsequent init calls 
>> > only
>> > + * run once a subordinate's dependencies have run. When needed custom
>> > + * dependency requirements can also be spelled out through a custom 
>> > dependency
>> > + * checker. In order to account for the dual entry point nature of x86-64 
>> > Linux
>> > + * for "paravirtualization yielding" and to make annotations for support 
>> > for
>> > + * these explicit each struct x86_init_fn must specify supported
>> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture
>> > + * though is after load_idt(), as such the earliest we can currently rely 
>> > on
>> > + * subarchitecture for semantics and a common init sequences is on the 
>> > shared
>> > + * common x86_64_start_reservations().  Each struct x86_init_fn must also
>> > + * declare a two-digit decimal number to impose an ordering relative to 
>> > other
>> > + * features when required.
>>
>> I'm totally lost in the paragraph above.
>
> As an example Kasan could be defined as a struct x86_init_fn with a respective
> callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch()
> (kasan_init()). Since we can't support for struct x86_init_fn calls to start 
> all
> the way from x86_64_start_kernel() (given the issues I had noted before)
> the earliest a struct x86_init_fn can be used for is for a callback on
> x86_64_start_reservations().

I'm still lost.  Kasan is a pile of code that does something.  Kasan
has some initialization code.  Do you mean that kasan's initialization
code could be encapsulated in one or more x86_init_fn instances?

>
> It just means we can different callbacks used for different parts of the
> x86 init process, it explains the limits to the callbacks we can have right 
> now.

Do you mean that there would be different linker tables containing
x86_init_fn instances for each of these callback points?  Or is
"early_init" the only such callback point.

BTW, why is early_init called early_init?  Why not call it "callback"?
 Then you could reuse this for other things.

>
>> > + *
>> > + * x86_init_fn enables strong semantics and dependencies to be defined and
>> > + * implemented on the full x86 initialization sequence.
>>
>> Please try explaining what this is, instead.  For example:
>>
>> An x86_init_fn represents a function to be called at a certain point
>> during initialization on a specific set of subarchitectures.
>
> OK thanks.
>
>> > + *
>> > + * @order_level: must be set, linker order level, this corresponds to the 
>> > table
>> > + *     section sub-table index, we record this only for semantic 
>> > validation
>> > + *     purposes.
>>
>> I read this as "this is purely a debugging feature".  Is it?  I think it's 
>> not.
>
> Nope, the order_level really is used for linker table sorting, but since we 
> can
> modify the table for these structs with a run time sort later, we want to 
> ensure
> that the order level is still respected after we sort at run time.
>
>> >  Order-level is always required however you typically would
>> > + *     only use X86_INIT_NORMAL*() and leave ordering to be done by 
>> > placement
>> > + *     of code in a C file and the order of objects through a Makefile. 
>> > Custom
>> > + *     order-levels can be used when order on C file and order of objects 
>> > on
>> > + *     Makfiles does not suffice or much further refinements are needed.
>>
>>
>> Assuming I understand this correctly, here's how I'd write it:
>>
>> @order_level: The temporal order of this x86_init_fn.  Lower
>> order_level numbers are called first.  Ties are broken by order found
>> in the Makefile and then by order in the C file.
>
> Thanks... yes.
>
>>
>> Note, however, that my proposed explanation can't be right because it
>> appears to conflict with "depend".  Please adjust accordingly.
>>
>> > + * @supp_hardware_subarch: must be set, it represents the bitmask of 
>> > supported
>> > + *     subarchitectures.  We require each struct x86_init_fn to have this 
>> > set
>> > + *     to require developer considerations for each supported x86
>> > + *     subarchitecture and to build strong annotations of different 
>> > possible
>> > + *     run time states particularly in consideration for the two main
>> > + *     different entry points for x86 Linux, to account for 
>> > paravirtualization
>> > + *     yielding.
>> '
>> Too much motivation, too little documentation.
>>
>> @supp_hardware_subarch: A bitmask of subarchitectures on which to call
>> this init function.
>
> OK thanks.
>

>
>> > + *
>> > + * @detect: optional, if set returns true if the feature has been 
>> > detected to
>> > + *     be required, it returns false if the feature has been detected to 
>> > not
>> > + *     be required.
>>
>> I have absolutely no idea what this means.
>
> Well, so the order level stuff is used at link time, there however are some
> run time environment conditions that might be important to consider as well,
> the detect lets you write a C code routine which will check to see if the
> feature should run, it returns false if the feature should not run or is
> not required to run.

So how is:

static bool detect_foo(void) { return whatever; }
statis void init_foo(void) { do something; }

.detect = detect_foo,
.early_init = init_foo,

different from:

static void init_foo(void)
{
  if (whatever) do something;
}

.early_init = init_foo,


>
> A cheesy way to implement a cheesy graph is to then use the same callback
> as a pointer for dependency, so that if a feature depends on another, both
> are on the same order level, and if they have a run time dependency
> annotation, they can further annotate this relationship by having the
> thing that depends, set its depends pointer to refer to the other
> feature's detect callback.
>
> This is borrowed from the IOMMU init code in the kernel. Same logic.
> Only thing is we also get link order support too. Link order semantics
> and logic should suffice for most things, but not all as with the IOMMU
> init stuff.

But this doesn't work at all if .detect isn't set.

>> And what happens if the depend-implied order is
>> inconsistent with order_level.
>
> Ah -- that's what I was trying to explain above in the comment
> about inconsistent state -- a run time check *after* we do run
> time sorting goes through and checks for sanity.
>
> All this is a light weight feature graph, if you will, other
> folks are working on different solutions for this and while
> I have considered sharing a solution for early init code this
> seems a bit more ideal for now. Later in the future though we
> should probably get together and talk about a proper feature
> graph, with more consistent checks, etc. Its another reason
> why I am also curious to see a SAT solver at run time in the
> future, but this is super long term [0]...
>
> [0] http://kernelnewbies.org/KernelProjects/kconfig-sat
>
>> I would hope that order_level breaks ties in the topological sort and
>> that there's a bit fat warning if the order_level ordering is
>> inconsistent with the topological ordering.  Preferably the warning
>> would be at build time, in which case it could be an error.
>
> Order level will always be respected at link time. The depend
> and detect thing are heuristics used at run time and since
> it depends on run time state we can't do this at build time.
> The big fat hairy nasty warning is there though. Curious
> enough I ended up finding out that the sanity check for
> IOMMU was intended to only be run at debug time but the
> debug thing was *always* set to true, so we've been running
> the sanity checker in the kernel for ages. Its why I also
> feel a bit more comfortable in keeping it live and matching
> more close to its implementation. I've gone to pains to
> build off of the IOMMU init solution and simply extend
> on the semantics there. Turns out there was much room for
> enhancements. All the fine tuning are documented in the
> userspace tree.

So is this what you're saying:

Callbacks are called in an order consistent with the dependencies
specified by @depend.  Ties are broken first by @order_level and then
by linker order.  It is an error for an x86_init_fn with lower
@order_level to depend on an x86_init_fn with higher @order_level.

>
>> > + * @early_init: required, routine which will run in 
>> > x86_64_start_reservations()
>> > + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
>> > + *     properly set. Memory is not yet available. This the earliest we can
>> > + *     currently define a common shared callback since all callbacks need 
>> > to
>> > + *     check for boot_params.hdr.hardware_subarch and this becomes 
>> > accessible
>> > + *     on x86-64 until after load_idt().
>>
>> What's this for?  Under what conditions is it called?  What order?
>> Why is it part of x86_init_fn at all?
>
> Its one of the reasons we are adding this, callbacks to for features.
> Its called if the its requirements are met (subarch, and depends).
> I'll work on extending this to be clearer.

I think this should just be ".callback".  Let the context by implied
by the linker table in which it appears.
.
>
>> > + */
>> > +struct x86_init_fn {
>> > +       __u32 order_level;
>> > +       __u32 supp_hardware_subarch;
>> > +       bool (*detect)(void);
>> > +       bool (*depend)(void);
>> > +       void (*early_init)(void);
>> > +       __u32 flags;
>> > +};
>> > +
>> > +/**
>> > + * enum x86_init_fn_flags: flags for init sequences
>> > + *
>> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init 
>> > sequence
>> > + *     has completed it can break out of the loop for init sequences on
>> > + *     its own level.
>>
>> What does this mean?
>
> It borrows the logic from the IOMMU stuff, it enables the core to detect
> if *any* routine in the linker table has completed we can then now just
> bail and stop going through the rest of the linker table.

Can you give an example for which this would be used?

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.