Xen project Mailing List

On Fri, Feb 19, 2016 at 4:42 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxx> wrote: > On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote: >> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote: >> > Any failure on the x86 init path can be catastrophic. >> > A simple shift of a call from one place to another can >> > easily break things. Likewise adding a new call to >> > one path without considering all x86 requirements >> > can make certain x86 run time environments crash. >> > We currently account for these requirements through >> > peer code review and run time testing. We could do >> > much better if we had a clean and simple way to annotate >> > strong semantics for run time requirements, init sequence >> > dependencies, and detection mechanisms for additions of >> > new x86 init sequences. >> >> Please document this in a way that will be useful for people trying to >> understand what the code does as opposed to just people who are trying >> to understand why you wrote it. More below. > > Sure. I'm kind of glad people are getting tired of me trying to > explain *why* I wrote it though, I figured that might be the harder > thing to explain. Hopefully it sinks in. > >> > +/** >> > + * struct x86_init_fn - x86 generic kernel init call >> > + * >> > + * Linux x86 features vary in complexity, features may require work done >> > at >> > + * different levels of the full x86 init sequence. Today there are also >> > two >> > + * different possible entry points for Linux on x86, one for bare metal, >> > KVM >> > + * and Xen HVM, and another for Xen PV guests / dom0. Assuming a >> > bootloader >> > + * has set up 64-bit mode, roughly the x86 init sequence follows this >> > path: >> > + * >> > + * Bare metal, KVM, Xen HVM Xen PV / dom0 >> > + * startup_64() startup_xen() >> > + * \ / >> > + * x86_64_start_kernel() xen_start_kernel() >> > + * \ / >> > + * x86_64_start_reservations() >> > + * | >> > + * start_kernel() >> > + * [ ... ] >> > + * [ setup_arch() ] >> > + * [ ... ] >> > + * init >> > + * >> >> >> I don't think this paragraph below is necessary. I also think it's >> very confusing. Keep in mind that the reader has no idea what a >> "level" is at this point and that the reader also doesn't need to >> think about terms like "paravirtualization yielding". >> >> > + * x86_64_start_kernel() and xen_start_kernel() are the respective first >> > C code >> > + * entry starting points. The different entry points exist to enable Xen >> > to >> > + * skip a lot of hardware setup already done and managed on behalf of the >> > + * hypervisor, we refer to this as "paravirtualization yielding". The >> > different >> > + * levels of init calls on the x86 init sequence exist to account for >> > these >> > + * slight differences and requirements. These different entry points also >> > share >> > + * a common entry x86 specific path, x86_64_start_reservations(). > > OK sure, I'll nuke. > >> > + * >> >> And here, I don't even know what a "feature" is. > > Kasan is an example. Is kasan different from the cr4 shadow in this context? I don't really feel like calling the cr4 shadow a "feature" is meaningful. Am I misunderstanding something? > >> > + * A generic x86 feature can have different initialization calls, one on >> > each >> > + * of the different main x86 init sequences, but must also address both >> > entry >> > + * points in order to work properly across the board on all supported x86 >> > + * subarchitectures. Since x86 features can also have dependencies on >> > other >> > + * setup code or features, x86 features can at times be subordinate to >> > other >> > + * x86 features, or conditions. struct x86_init_fn enables feature >> > developers >> > + * to annotate dependency relationships to ensure subsequent init calls >> > only >> > + * run once a subordinate's dependencies have run. When needed custom >> > + * dependency requirements can also be spelled out through a custom >> > dependency >> > + * checker. In order to account for the dual entry point nature of x86-64 >> > Linux >> > + * for "paravirtualization yielding" and to make annotations for support >> > for >> > + * these explicit each struct x86_init_fn must specify supported >> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture >> > + * though is after load_idt(), as such the earliest we can currently rely >> > on >> > + * subarchitecture for semantics and a common init sequences is on the >> > shared >> > + * common x86_64_start_reservations(). Each struct x86_init_fn must also >> > + * declare a two-digit decimal number to impose an ordering relative to >> > other >> > + * features when required. >> >> I'm totally lost in the paragraph above. > > As an example Kasan could be defined as a struct x86_init_fn with a respective > callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch() > (kasan_init()). Since we can't support for struct x86_init_fn calls to start > all > the way from x86_64_start_kernel() (given the issues I had noted before) > the earliest a struct x86_init_fn can be used for is for a callback on > x86_64_start_reservations(). I'm still lost. Kasan is a pile of code that does something. Kasan has some initialization code. Do you mean that kasan's initialization code could be encapsulated in one or more x86_init_fn instances? > > It just means we can different callbacks used for different parts of the > x86 init process, it explains the limits to the callbacks we can have right > now. Do you mean that there would be different linker tables containing x86_init_fn instances for each of these callback points? Or is "early_init" the only such callback point. BTW, why is early_init called early_init? Why not call it "callback"? Then you could reuse this for other things. > >> > + * >> > + * x86_init_fn enables strong semantics and dependencies to be defined and >> > + * implemented on the full x86 initialization sequence. >> >> Please try explaining what this is, instead. For example: >> >> An x86_init_fn represents a function to be called at a certain point >> during initialization on a specific set of subarchitectures. > > OK thanks. > >> > + * >> > + * @order_level: must be set, linker order level, this corresponds to the >> > table >> > + * section sub-table index, we record this only for semantic >> > validation >> > + * purposes. >> >> I read this as "this is purely a debugging feature". Is it? I think it's >> not. > > Nope, the order_level really is used for linker table sorting, but since we > can > modify the table for these structs with a run time sort later, we want to > ensure > that the order level is still respected after we sort at run time. > >> > Order-level is always required however you typically would >> > + * only use X86_INIT_NORMAL*() and leave ordering to be done by >> > placement >> > + * of code in a C file and the order of objects through a Makefile. >> > Custom >> > + * order-levels can be used when order on C file and order of objects >> > on >> > + * Makfiles does not suffice or much further refinements are needed. >> >> >> Assuming I understand this correctly, here's how I'd write it: >> >> @order_level: The temporal order of this x86_init_fn. Lower >> order_level numbers are called first. Ties are broken by order found >> in the Makefile and then by order in the C file. > > Thanks... yes. > >> >> Note, however, that my proposed explanation can't be right because it >> appears to conflict with "depend". Please adjust accordingly. >> >> > + * @supp_hardware_subarch: must be set, it represents the bitmask of >> > supported >> > + * subarchitectures. We require each struct x86_init_fn to have this >> > set >> > + * to require developer considerations for each supported x86 >> > + * subarchitecture and to build strong annotations of different >> > possible >> > + * run time states particularly in consideration for the two main >> > + * different entry points for x86 Linux, to account for >> > paravirtualization >> > + * yielding. >> ' >> Too much motivation, too little documentation. >> >> @supp_hardware_subarch: A bitmask of subarchitectures on which to call >> this init function. > > OK thanks. > > >> > + * >> > + * @detect: optional, if set returns true if the feature has been >> > detected to >> > + * be required, it returns false if the feature has been detected to >> > not >> > + * be required. >> >> I have absolutely no idea what this means. > > Well, so the order level stuff is used at link time, there however are some > run time environment conditions that might be important to consider as well, > the detect lets you write a C code routine which will check to see if the > feature should run, it returns false if the feature should not run or is > not required to run. So how is: static bool detect_foo(void) { return whatever; } statis void init_foo(void) { do something; } .detect = detect_foo, .early_init = init_foo, different from: static void init_foo(void) { if (whatever) do something; } .early_init = init_foo, > > A cheesy way to implement a cheesy graph is to then use the same callback > as a pointer for dependency, so that if a feature depends on another, both > are on the same order level, and if they have a run time dependency > annotation, they can further annotate this relationship by having the > thing that depends, set its depends pointer to refer to the other > feature's detect callback. > > This is borrowed from the IOMMU init code in the kernel. Same logic. > Only thing is we also get link order support too. Link order semantics > and logic should suffice for most things, but not all as with the IOMMU > init stuff. But this doesn't work at all if .detect isn't set. >> And what happens if the depend-implied order is >> inconsistent with order_level. > > Ah -- that's what I was trying to explain above in the comment > about inconsistent state -- a run time check *after* we do run > time sorting goes through and checks for sanity. > > All this is a light weight feature graph, if you will, other > folks are working on different solutions for this and while > I have considered sharing a solution for early init code this > seems a bit more ideal for now. Later in the future though we > should probably get together and talk about a proper feature > graph, with more consistent checks, etc. Its another reason > why I am also curious to see a SAT solver at run time in the > future, but this is super long term [0]... > > [0] http://kernelnewbies.org/KernelProjects/kconfig-sat > >> I would hope that order_level breaks ties in the topological sort and >> that there's a bit fat warning if the order_level ordering is >> inconsistent with the topological ordering. Preferably the warning >> would be at build time, in which case it could be an error. > > Order level will always be respected at link time. The depend > and detect thing are heuristics used at run time and since > it depends on run time state we can't do this at build time. > The big fat hairy nasty warning is there though. Curious > enough I ended up finding out that the sanity check for > IOMMU was intended to only be run at debug time but the > debug thing was *always* set to true, so we've been running > the sanity checker in the kernel for ages. Its why I also > feel a bit more comfortable in keeping it live and matching > more close to its implementation. I've gone to pains to > build off of the IOMMU init solution and simply extend > on the semantics there. Turns out there was much room for > enhancements. All the fine tuning are documented in the > userspace tree. So is this what you're saying: Callbacks are called in an order consistent with the dependencies specified by @depend. Ties are broken first by @order_level and then by linker order. It is an error for an x86_init_fn with lower @order_level to depend on an x86_init_fn with higher @order_level. > >> > + * @early_init: required, routine which will run in >> > x86_64_start_reservations() >> > + * after we ensure boot_params.hdr.hardware_subarch is accessible and >> > + * properly set. Memory is not yet available. This the earliest we can >> > + * currently define a common shared callback since all callbacks need >> > to >> > + * check for boot_params.hdr.hardware_subarch and this becomes >> > accessible >> > + * on x86-64 until after load_idt(). >> >> What's this for? Under what conditions is it called? What order? >> Why is it part of x86_init_fn at all? > > Its one of the reasons we are adding this, callbacks to for features. > Its called if the its requirements are met (subarch, and depends). > I'll work on extending this to be clearer. I think this should just be ".callback". Let the context by implied by the linker table in which it appears. . > >> > + */ >> > +struct x86_init_fn { >> > + __u32 order_level; >> > + __u32 supp_hardware_subarch; >> > + bool (*detect)(void); >> > + bool (*depend)(void); >> > + void (*early_init)(void); >> > + __u32 flags; >> > +}; >> > + >> > +/** >> > + * enum x86_init_fn_flags: flags for init sequences >> > + * >> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init >> > sequence >> > + * has completed it can break out of the loop for init sequences on >> > + * its own level. >> >> What does this mean? > > It borrows the logic from the IOMMU stuff, it enables the core to detect > if *any* routine in the linker table has completed we can then now just > bail and stop going through the rest of the linker table. Can you give an example for which this would be used? --Andy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.

Re: [Xen-devel] [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies