[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



Hi,

some inline comments.

On 05.06.2015 17:00, Konrad Rzeszutek Wilk wrote:
> On Wed, May 20, 2015 at 05:11:20PM +0200, Martin Pohlack wrote:
>> Hi,
>>
>> this looks very interesting.
> 
> Thank you!
>>
>> I have talked about an experimental Xen hotpatching design at Linux
>> Plumbers Conference 2014 in Düsseldorf, slides are here:
>>
>> http://www.linuxplumbersconf.net/2014/ocw//system/presentations/2421/original/xen_hotpatching-2014-10-16.pdf
>>
>> and would like to share a couple of observations from the slides:
>>
>> * We found splicing at hypervisor exit with a barrier for all CPUs to
>>   be a good place, because all Xen stacks are effectively empty at
>>   that point.  We use a barrier with a maximum timeout (that can be
>>   adapted).  The approach is similar in concept to stop_machine and
>>   the time rendezvous but is time-bound and does not require all the
>>   asynchronous operations discussed below.
> 
> hypervisor exit = vmexit?

No, when the hypervisor code is left and checks soft IRQs.

>>
>> * Hotpatch generation often requires support for compiling the target
>>   with -ffunction-sections / -fdata-sections.  Last time I looked, you
>>   can compile Xen with that option, but the linker scripts are not
>>   fully adapted to deal with it and the resulting hypervisor cannot be
>>   booted.
> 
> Correct. That (adapting the Makefile to do a second build with this)
> would be needed as well.
> 
>>
>> * Xen as it is now, has a couple of non-unique symbol names which will
>>   make runtime symbol identification hard.  Sometimes, static symbols
>>   simply have the same name in C files, sometimes such symbols get
>>   included via header files, and some C files are also compiled
>>   multiple times and linked under different names (guest_walk.c).  I
>>   think the Linux kernel solves this by aiming at unique symbol names
>>   even for local symbols.
>>
>>   nm xen-syms | cut -f3- -d\  | sort | uniq -c | sort -nr | head
> 
> Oh my. Looks like a couple of prepartion patches will be in order!
> 
>>
>> * You may need support for adapting or augmenting exception tables if
>>   patching such code is desired (it probably is).  Hotpatches may need
>>   to bring their own small exception tables (similar to how Linux
>>   modules support this).  If you don't plan on supporting hotpatches
>>   that introduce additional exception-locations, one could also change
>>   the exception table in-place and reorder it afterwards.
> 
> Each patch carries 'section' structures which define what kind
> of massaging needs to be done. As in each of these 'section' structues
> says:
>  - At the original address X there was an excetion table
>  - (or) At the origian address Y there was an alt assembler
> 
> And we would:
>  - During 'check' state, confirm that indeed X is in the exception table
>    (or Y in the alt assembler)
> 
>  - During the apply state, fix the exception table X offset to point to
>    the new virtual address.
> 
> Or perhaps I am over-simplying it? My recollection of the exception
> table code is that it would be mostly dealing with the table and
> changing the virtual address, but I hadn't dug it in details.

A simple approach would indeed be to patch the table in-place with the
new addresses for the replacement functions and fix-up sections.  But,
you would also have to reorder the table to keep the binary search working.

And of course this approach would not support adding additional entries
to the table as it is allocated at compile / link time.  So you cannot
support a hotpatch that would introduce an additional entry.

Linux modules, in contrast can bring their own exception tables and
those are walked by the handler whenever the main table does not contain
an entry.  I have not implemented that aspect yet, but would consider
this to be the most desirable design upon first look.

>> * Hotpatch interdependencies are tricky.  IIRC the discussion at the
> 
> Oooh, that bubbled in my nightmare subconcious but hadn't burst out yet.
> Ugh, what did I sign up for! 
> 
>>   Live Patching track at LPC, both kpatch and kgraft aimed, at the
>>   time, at a single large patch that subsumes and replaces all previous
>>   ones.
>>
>> Some more comments inline below ...
>>
>> Cheers,
>> Martin
>>
>> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
>>> Hey!
>>>
>>> During the Xen Hacka^H^H^H^HProject Summit? we chatted about live-patching
>>> the hypervisor. We sketched out how it could be done, and brainstormed
>>> some of the problems.
>>>
>>> I took that and wrote an design - which is very much RFC. The design is
>>> laid out in two sections - the format of the ELF payload - and then the
>>> hypercalls to act on it.
>>>
>>> Hypercall preemption has caused a couple of XSAs so I've baked the need
>>> for that in the design so we hopefully won't have an XSA for this code.
>>>
>>> There are two big *TODO* in the design which I had hoped to get done
>>> before sending this out - however I am going on vacation for two weeks
>>> so I figured it would be better to send this off for folks to mull now
>>> then to have it languish.
>>>
>>> Please feel free to add more folks on the CC list.
>>>
>>> Enjoy!
>>>
>>>
>>> # xSplice Design v1 (EXTERNAL RFC v2)
>>>
>>> ## Rationale
>>>
>>> A mechanism is required to binarily patch the running hypervisor with new
>>> opcodes that have come about due to primarily security updates.
>>>
>>> This document describes the design of the API that would allow us to
>>> upload to the hypervisor binary patches.
>>>
>>> ## Glossary
>>>
>>>  * splice - patch in the binary code with new opcodes
>>>  * trampoline - a jump to a new instruction.
>>>  * payload - telemetries of the old code along with binary blob of the new
>>>    function (if needed).
>>>  * reloc - telemetries contained in the payload to construct proper 
>>> trampoline.
>>>
>>> ## Multiple ways to patch
>>>
>>> The mechanism needs to be flexible to patch the hypervisor in multiple ways
>>> and be as simple as possible. The compiled code is contiguous in memory with
>>> no gaps - so we have no luxury of 'moving' existing code and must either
>>> insert a trampoline to the new code to be executed - or only modify in-place
>>> the code if there is sufficient space. The placement of new code has to be 
>>> done
>>> by hypervisor and the virtual address for the new code is allocated 
>>> dynamically.
>>> i
>>> This implies that the hypervisor must compute the new offsets when splicing
>>> in the new trampoline code. Where the trampoline is added (inside
>>> the function we are patching or just the callers?) is also important.
>>>
>>> To lessen the amount of code in hypervisor, the consumer of the API
>>> is responsible for identifying which mechanism to employ and how many 
>>> locations
>>> to patch. Combinations of modifying in-place code, adding trampoline, etc
>>> has to be supported. The API should allow read/write any memory within
>>> the hypervisor virtual address space.
>>>
>>> We must also have a mechanism to query what has been applied and a mechanism
>>> to revert it if needed.
>>>
>>> We must also have a mechanism to: provide an copy of the old code - so that
>>> the hypervisor can verify it against the code in memory; the new code;
>>
>> As Xen has no stable in-hypervisor API / ABI, you need to make sure
>> that a generated module matches a target hypervisor.  In our design,
>> we use build IDs for that (ld --build-id).  We embed build IDs at Xen
>> compile time and can query a running hypervisor for its ID and only
>> load matching patches.
>>
>> This seems to be an alternative to your proposal to include old code
>> into hotpatch modules.
> 
> That is much simpler.

Just to clarify, are you agreeing that the build ID approach is much
simpler?

> There is a nice part of the old code check - you
> can check (and deal with) patching an already patched code.
> As in, if the payload was configured to be applied on top of an already
> patched function it would patch nicely. But if the payload is against
> the virgin code - and the hypervisor is running an older patch, we would
> bail out.

You can do that too with the build IDs if there is some mechanism that
loads hotpatches in the same order as they were built in (if they
overlap).  The simplest approach that comes to mind is a hotpatch stack,
instead of independent patches.

>>> the symbol name of the function to be patched; or offset from the symbol;
>>> or virtual address.
>>>
>>> The complications that this design will encounter are explained later
>>> in this document.
>>>
>>> ## Patching code
>>>
>>> The first mechanism to patch that comes in mind is in-place replacement.
>>> That is replace the affected code with new code. Unfortunately the x86
>>> ISA is variable size which places limits on how much space we have available
>>> to replace the instructions.
>>>
>>> The second mechanism is by replacing the call or jump to the
>>> old function with the address of the new function.
>>
>> We found splicing via a non-conditional JMP in the beginning of the
>> old target function to be a sensible and uncomplicated approach.  This
>> approach works at function level which feels like a very natural unit
>> to work at.  The really nice thing of this approach is that you don't
>> have to track all the, potential indirect, call sites.
> 
> <nods>
>>
>> As you discussed, if you allocate hotpatch memory withing +-2GB of the
>> patch location, no further trampoline indirection is required, a
>> 5-byte JMP does the trick on x86.  We found that to be sufficient in
>> our experiments.
> 
> And worst case if you do need more than +-2GB you can just have
> two jumps. Kind of silly but possible.
> 
> Thank you for your input and lookign forward to your reply!

Martin


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.