[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design

>>> On 15.05.15 at 21:44, <konrad.wilk@xxxxxxxxxx> wrote:
> As such having the payload in an ELF file is the sensible way. We would be
> carrying the various set of structures (and data) in the ELF sections under
> different names and with definitions. The prefix for the ELF section name
> would always be: *.xsplice_*
> Note that every structure has padding. This is added so that the hypervisor
> can re-use those fields as it sees fit.
> There are five sections *.xsplice_* sections:

A general remark: Having worked on ELF on different occasions,
UNIX'es (and hence Linux'es) use of section names to identify the
sections' purposes is pretty contrary to ELF's original intentions.
Section types (and architecture as well as OS extensions to them)
exist for a reason. Of course from the following sections it's not
really clear in how far you intended this to be done. If you did
and just didn't explicitly say so, pointing out that relations
between sections should then also be expressed suitably via ELF
mechanisms (sh_link, sh_info) would then be unnecessary too. If
you didn't, then I'd strongly suggest switching to a formally more
correct model, which would then include specifying section types
and section attributes (and optional other relevant aspects)
along with their (then no longer mandatory) names.

>  * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be 
> referenced
>    during the update. This can contain the symbols (functions) that will be
>    patched, or the list of symbols (functions) to be checked pre-patching 
> which
>    may not be on the stack.
> * `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly construct
>    trampolines for an patch. We can have multiple locations for which we
>    need to insert an trampoline for a payload and each location might require
>    a different way of handling it. This would naturally reference the `.text`
>    section and its proper offset. The `.xsplice_reloc` is not directly 
> concerned
>    with patches but rather is an ELF relocation - describing the target
>    of a relocation and how that is performed.  They're also used for where
>    the new code references the run code too.
>  * `.xsplice_sections`. The safety data for the old code and new code.
>    This contains an array of symbols (pointing to `.xsplice_symbols` to
>    and `.text`) which are to be used during safety and dependency checking.
>  * `.xsplice_patches`: The description of the new functions to be patched
>    in (size, type, pointer to code, etc.).
>  * `.xsplice_change`. The structure that ties all of this together and defines
>    the payload.
> Additionally the ELF file would contain:
>  * `.text` section for the new and old code (function).
>  * `.rela.text` relocation data for the `.text` (both new and old).
>  * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such as 
> offset
>    to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section).

I think you should not immediately rule out REL relocations, i.e.
specify both .rela.* and .rel.*.

>  * `.bss` section for the new code (function)
>  * `.data` and `.data.read_mostly` section for the new and old code (function)
>  * `.rodata` section for the new and old code (function).
> In short the *.xsplice_* sections represent various structures and the
> ELF provides the mechanism to glue it all together when loaded in memory.

As per above - I think ELF can do this in a more explicit way. Of
course that would imply sticking to strictly ELF types in the structure
definitions below. That would in turn have the advantage of
expressing e.g. "const char *" references to strings by section
offsets into a specified string section, reducing (with the ultimate
goal of eliminating) the need for processing relocations on the
ELF object (possible references to "external symbols" left aside for
the moment).

> Note that a lot of these ideas are borrowed from kSplice which is
> available at: https://github.com/jirislaby/ksplice 
> For ELF understanding the best starting point is the OSDev Wiki
> (http://wiki.osdev.org/ELF). Furthermore the ELF specification is
> at http://www.skyfree.org/linux/references/ELF_Format.pdf and
> at Oracle's web site:
> http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scrolltoc

I think the master copy still is


> ### xsplice_reloc and xsplice_reloc_howto
> The section contains an array of a structure that outlines the different
> locations (and howto) for which an trampoline is to be inserted.
> The howto defines in the detail the change. It contains the type,
> whether the relocation is relative, the size of the relocation,
> bitmask for which parts of the instruction or data are to be replaced,
> amount of final relocation is shifted by (to drop unwanted data), and
> whether the replacement should be interpreted as signed value.
> The structure is as follow:
> <pre>
> #define XSPLICE_HOWTO_RELOC_INLINE  0 /* Inline replacement. */  
> #define XSPLICE_HOWTO_RELOC_PATCH   1 /* Add trampoline. */  
> #define XSPLICE_HOWTO_RELOC_DATA    2 /*  __DATE__ type change. */  


> #define XSPLICE_HOWTO_RELOC_TIME    3 /* __TIME__ type chnage. */  

Assuming the former is DATE - what do you foresee date/time
kinds of changes to be good for?

> #define XSPLICE_HOWTO_BUG           4 /* BUG_ON being replaced.*/  
> #define XSPLICE_HOWTO_EXTABLE       5 /* exception_table change. */  
> #define XSPLICE_HOWTO_SYMBOL        6 /* change in symbol table. */  

For none of these I really understand their purpose.

> ### xsplice_sections
> The structure defined in this section is used to verify that it is safe
> to update with the new changes. It can contain safety data on the old code
> and what kind of matching we are to expect.
> It also can contain safety date of what to check when about to patch.
> That is whether any of the addresses (either provided or resolved
> when payload is loaded by referencing the symbols) are in memory
> with what we expect it to be.
> As such the flags can be or-ed together:
> <pre>
> #define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */  
> #define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */  
> #define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .rodata */  

DATA or .rodata?

> #define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */  
> #define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has 
> .altinstructions. */  

I don't see the purpose of adding flags for a few special sections we
may know about now - what if over time hundreds of sections appear?

> #define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */   
> #dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */  
> #define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. 
> */  

This isn't the first reference to "stack", without me having seen a
description of what the word is meant to stand for in such contexts.
Did I overlook it?

> struct xsplice_section {  
>     struct xsplice_symbol *symbol; /* The symbol associated with this change. 
> */  
>     uint64_t address; /* The address of the section (if known). */  
>     uint64_t size; /* The size of the section. */  

I don't think you need 64-bit types here even on 32-bit architectures.
Use Elf*_Addr and Elf*_Word?

> ### xsplice_patches
> Within this section we have an array of a structure defining the new code 
> (patch).
> This structure consist of an pointer to the new code (which in ELF ends up
> pointing to an offset in `.text` or `.data` section); the type of patch:
> inline - either text or data, or requesting an trampoline; and size of patch.
> The structure is as follow:
> <pre>

Again it remains vague what these are supposed to express.

> Retrieve an summary of an specific payload. This caller provides:
>  * `id` the unique id.
>  * `status` *MUST* be set to zero.
>  * `rc` *MUST* be set to zero.
> The `summary` structure contains an summary of payload which includes:
>  * `id` the unique id.
>  * `status` - whether it has been:
>  1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
> command.
>  3. *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
>  4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
>  5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also 
> reverted.
>  6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` 
> for details.
>  * `rc` - its error state if any.

This duplication of status and rc looks odd to me, the more that only
appears to mean an error. How about using negative status values
as error indicators?

> ## Sequence of events.
> The normal sequence of events is to:
>  1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors 
> *STOP* here.
>  2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> step.
> verify that the payload can be succesfully applied.
>  4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> step.
> patch.
>  6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> success.

Is this meant to imply that e.g. ACTION_APPLY will fail unless the
respective payload is in CHECKED state?

> ### Alternative assembler
> Alternative assembler is a mechanism to use different instructions depending
> on what the CPU supports. This is done by providing multiple streams of code
> that can be patched in - or if the CPU does not support it - padded with
> `nop` operations. The alternative assembler macros cause the compiler to
> expand the code to place a most generic code in place - emit a special
> ELF .section header to tag this location. During run-time the hypervisor
> can leave the areas alone or patch them with an better suited opcodes.
> As we might be patching the alternative assembler sections as well - by
> providing a new better suited op-codes or perhaps with nops - we need to
> also re-run the alternative assembler patching after we have done our
> patching.

These sections are part of .init.* and as such can't reasonably be
subject to patching.

> ### .bss sections
> Patching writable data is not suitable as it is unclear what should be done
> depending on the current state of data. As such it should not be attempted.

This certainly also applies to .data.

> ### Time rendezvous code instead of stop_machine for patching
> The hypervisor's time rendezvous code runs synchronously across all CPUs
> every second. Using the stop_machine to patch can stall the time rendezvous
> code and result in NMI. As such having the patching be done at the tail
> of rendezvous code should avoid this problem.

I know this was brough up on the hackathon, but did you verify this
to be a problem? Because if it is, we already have an issue here as
we already use stop_machine for things like CPU offlining.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.