[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



On Mon, May 18, 2015 at 01:41:34PM +0100, Jan Beulich wrote:
> >>> On 15.05.15 at 21:44, <konrad.wilk@xxxxxxxxxx> wrote:
> > As such having the payload in an ELF file is the sensible way. We would be
> > carrying the various set of structures (and data) in the ELF sections under
> > different names and with definitions. The prefix for the ELF section name
> > would always be: *.xsplice_*
> > 
> > Note that every structure has padding. This is added so that the hypervisor
> > can re-use those fields as it sees fit.
> > 
> > There are five sections *.xsplice_* sections:
> 
> A general remark: Having worked on ELF on different occasions,
> UNIX'es (and hence Linux'es) use of section names to identify the
> sections' purposes is pretty contrary to ELF's original intentions.
> Section types (and architecture as well as OS extensions to them)
> exist for a reason. Of course from the following sections it's not
> really clear in how far you intended this to be done. If you did
> and just didn't explicitly say so, pointing out that relations
> between sections should then also be expressed suitably via ELF
> mechanisms (sh_link, sh_info) would then be unnecessary too. If
> you didn't, then I'd strongly suggest switching to a formally more
> correct model, which would then include specifying section types
> and section attributes (and optional other relevant aspects)
> along with their (then no longer mandatory) names.


Something like:
http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter7-1.html

With (for example) an pointer to a specific section:
http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter7-28.html#scrolltoc
?

> 
> >  * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be 
> > referenced
> >    during the update. This can contain the symbols (functions) that will be
> >    patched, or the list of symbols (functions) to be checked pre-patching 
> > which
> >    may not be on the stack.
> > 
> > * `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly construct
> >    trampolines for an patch. We can have multiple locations for which we
> >    need to insert an trampoline for a payload and each location might 
> > require
> >    a different way of handling it. This would naturally reference the 
> > `.text`
> >    section and its proper offset. The `.xsplice_reloc` is not directly 
> > concerned
> >    with patches but rather is an ELF relocation - describing the target
> >    of a relocation and how that is performed.  They're also used for where
> >    the new code references the run code too.
> > 
> >  * `.xsplice_sections`. The safety data for the old code and new code.
> >    This contains an array of symbols (pointing to `.xsplice_symbols` to
> >    and `.text`) which are to be used during safety and dependency checking.
> > 
> > 
> >  * `.xsplice_patches`: The description of the new functions to be patched
> >    in (size, type, pointer to code, etc.).
> > 
> >  * `.xsplice_change`. The structure that ties all of this together and 
> > defines
> >    the payload.
> > 
> > Additionally the ELF file would contain:
> > 
> >  * `.text` section for the new and old code (function).
> >  * `.rela.text` relocation data for the `.text` (both new and old).
> >  * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such as 
> > offset
> >    to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section).
> 
> I think you should not immediately rule out REL relocations, i.e.
> specify both .rela.* and .rel.*.
> 
> >  * `.bss` section for the new code (function)
> >  * `.data` and `.data.read_mostly` section for the new and old code 
> > (function)
> >  * `.rodata` section for the new and old code (function).
> > 
> > In short the *.xsplice_* sections represent various structures and the
> > ELF provides the mechanism to glue it all together when loaded in memory.
> 
> As per above - I think ELF can do this in a more explicit way. Of
> course that would imply sticking to strictly ELF types in the structure
> definitions below. That would in turn have the advantage of
> expressing e.g. "const char *" references to strings by section
> offsets into a specified string section, reducing (with the ultimate
> goal of eliminating) the need for processing relocations on the
> ELF object (possible references to "external symbols" left aside for
> the moment).

Do you think that going that way would be too complex for this design?
Folks have expressed that perhaps I went overboard with this ELF and
should have just stuck with structures and just mention that it is
in an ELF file?

> 
> > Note that a lot of these ideas are borrowed from kSplice which is
> > available at: https://github.com/jirislaby/ksplice 
> > 
> > For ELF understanding the best starting point is the OSDev Wiki
> > (http://wiki.osdev.org/ELF). Furthermore the ELF specification is
> > at http://www.skyfree.org/linux/references/ELF_Format.pdf and
> > at Oracle's web site:
> > http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scrolltoc
> 
> I think the master copy still is
> 
> http://www.sco.com/developers/gabi/
> 
> > ### xsplice_reloc and xsplice_reloc_howto
> > 
> > The section contains an array of a structure that outlines the different
> > locations (and howto) for which an trampoline is to be inserted.
> > 
> > The howto defines in the detail the change. It contains the type,
> > whether the relocation is relative, the size of the relocation,
> > bitmask for which parts of the instruction or data are to be replaced,
> > amount of final relocation is shifted by (to drop unwanted data), and
> > whether the replacement should be interpreted as signed value.
> > 
> > The structure is as follow:
> > 
> > <pre>
> > #define XSPLICE_HOWTO_RELOC_INLINE  0 /* Inline replacement. */  
> > #define XSPLICE_HOWTO_RELOC_PATCH   1 /* Add trampoline. */  
> > #define XSPLICE_HOWTO_RELOC_DATA    2 /*  __DATE__ type change. */  
> 
> DATE or DATA?

DATE.

> 
> > #define XSPLICE_HOWTO_RELOC_TIME    3 /* __TIME__ type chnage. */  
> 
> Assuming the former is DATE - what do you foresee date/time
> kinds of changes to be good for?

In case we have source code which uses __DATE__ and the binary patch
modifies it - we would need to sort out other users of the __DATE__
and make sure that they are not affected (or perhaps they should be).

> 
> > #define XSPLICE_HOWTO_BUG           4 /* BUG_ON being replaced.*/  
> > #define XSPLICE_HOWTO_EXTABLE       5 /* exception_table change. */  
> > #define XSPLICE_HOWTO_SYMBOL        6 /* change in symbol table. */  
> 
> For none of these I really understand their purpose.

This is to help the hypervisor to figure out which of the lookup
mechanisms to use to find the relevant section. As in if we need
to update (as part of the patch) a normal function, but also
the exception table (because the new function is at a new virtual
address) we want to search in the exception table for the old 
one and replace it with the new virtual address.

> i
> > ### xsplice_sections
> > 
> > The structure defined in this section is used to verify that it is safe
> > to update with the new changes. It can contain safety data on the old code
> > and what kind of matching we are to expect.
> > 
> > It also can contain safety date of what to check when about to patch.
> > That is whether any of the addresses (either provided or resolved
> > when payload is loaded by referencing the symbols) are in memory
> > with what we expect it to be.
> > 
> > As such the flags can be or-ed together:
> > 
> > <pre>
> > #define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */  
> > #define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */  
> > #define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .rodata */  
> 
> DATA or .rodata?

Ooops. That should be the .rodata just by itself.
> 
> > #define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */  
> > #define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has 
> > .altinstructions. */  
> 
> I don't see the purpose of adding flags for a few special sections we
> may know about now - what if over time hundreds of sections appear?

We just need to know if it has - and then the hypervisor needs
to call apply_altinstructions to redo the alternative instructions.

I presume you are advocating to move this flag to an 'higher' level part
of the structures?

> 
> > #define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */   
> > #dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */  
> > #define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. 
> > */  
> 
> This isn't the first reference to "stack", without me having seen a
> description of what the word is meant to stand for in such contexts.
> Did I overlook it?

I mentioned it earlier in 'Example of trampoline and in-place splicing':

126 However it has severe drawbacks - the safety checks which have to make sure 
    
127 the function is not on the stack - must also check every caller. For some   
    
128 patches this could if there were an sufficient large amount of callers      
    
129 that we would never be able to apply the update.

But perhaps I should dive much deeper in about the patching
mechanism having to check whether the function to be patched
is on the stack.
> 
> > struct xsplice_section {  
> >     struct xsplice_symbol *symbol; /* The symbol associated with this 
> > change. */  
> >     uint64_t address; /* The address of the section (if known). */  
> >     uint64_t size; /* The size of the section. */  
> 
> I don't think you need 64-bit types here even on 32-bit architectures.
> Use Elf*_Addr and Elf*_Word?

OK.
> 
> > ### xsplice_patches
> > 
> > Within this section we have an array of a structure defining the new code 
> > (patch).
> > 
> > This structure consist of an pointer to the new code (which in ELF ends up
> > pointing to an offset in `.text` or `.data` section); the type of patch:
> > inline - either text or data, or requesting an trampoline; and size of 
> > patch.
> > 
> > The structure is as follow:
> > 
> > <pre>
> > #define XSPLICE_PATCH_INLINE_TEXT   0
> > #define XSPLICE_PATCH_INLINE_DATA   1
> > #define XSPLICE_PATCH_RELOC_TEXT    2
> 
> Again it remains vague what these are supposed to express.

(So I don't forget to update):

INLINE_TEXT: Patch the function in-line. As in alter it at the original
virtual address location. The patching MUST have the same size as the
original code.

INLINE_DATA: Patch the data in-line. As in alter it at the original
virtual address location. The patching data MUST have the same size
as the original data.

RELOC_TEXT: Patch original code with an jump to the new function.
Allocate a new page (or more) for the function and insert trampoline in
the old function to the new code.
> 
> > ### XEN_SYSCTL_XSPLICE_GET (1)
> > 
> > Retrieve an summary of an specific payload. This caller provides:
> > 
> >  * `id` the unique id.
> >  * `status` *MUST* be set to zero.
> >  * `rc` *MUST* be set to zero.
> > 
> > The `summary` structure contains an summary of payload which includes:
> > 
> >  * `id` the unique id.
> >  * `status` - whether it has been:
> >  1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
> >  2. *XSPLICE_STATUS_PROGRESS* (1) acting on the 
> > **XEN_SYSCTL_XSPLICE_ACTION** command.
> >  3. *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
> >  4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
> >  5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also 
> > reverted.
> >  6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult 
> > `rc` for details.
> >  * `rc` - its error state if any.
> 
> This duplication of status and rc looks odd to me, the more that only
> appears to mean an error. How about using negative status values
> as error indicators?

Perfect idea!
> 
> > ## Sequence of events.
> > 
> > The normal sequence of events is to:
> > 
> >  1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors 
> > *STOP* here.
> >  2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> > *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next 
> > step.
> >  3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to 
> > verify that the payload can be succesfully applied.
> >  4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> > *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next 
> > step.
> >  5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the 
> > patch.
> >  6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in 
> > *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with 
> > success.
> 
> Is this meant to imply that e.g. ACTION_APPLY will fail unless the
> respective payload is in CHECKED state?

Correct. I should include a state diagram to make this obvious. Thank you!

> 
> > ### Alternative assembler
> > 
> > Alternative assembler is a mechanism to use different instructions depending
> > on what the CPU supports. This is done by providing multiple streams of code
> > that can be patched in - or if the CPU does not support it - padded with
> > `nop` operations. The alternative assembler macros cause the compiler to
> > expand the code to place a most generic code in place - emit a special
> > ELF .section header to tag this location. During run-time the hypervisor
> > can leave the areas alone or patch them with an better suited opcodes.
> > 
> > As we might be patching the alternative assembler sections as well - by
> > providing a new better suited op-codes or perhaps with nops - we need to
> > also re-run the alternative assembler patching after we have done our
> > patching.
> 
> These sections are part of .init.* and as such can't reasonably be
> subject to patching.

True, thought I thought we have some for run-time as well.
> 
> > ### .bss sections
> > 
> > Patching writable data is not suitable as it is unclear what should be done
> > depending on the current state of data. As such it should not be attempted.
> 
> This certainly also applies to .data.

True. Thank you.
> 
> > ### Time rendezvous code instead of stop_machine for patching
> > 
> > The hypervisor's time rendezvous code runs synchronously across all CPUs
> > every second. Using the stop_machine to patch can stall the time rendezvous
> > code and result in NMI. As such having the patching be done at the tail
> > of rendezvous code should avoid this problem.
> 
> I know this was brough up on the hackathon, but did you verify this
> to be a problem? Because if it is, we already have an issue here as

No. Code-wise I've just cooked up the hypercall invocation code and hadn't
tried any other hard stuff.

> we already use stop_machine for things like CPU offlining.
> 

Thank you for your review!
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.