[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 01/13] xsplice: Design document (v5).



On 01/14/2016 09:46 PM, Konrad Rzeszutek Wilk wrote:
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches around system start.

I'd expect that the most obvious task apply patches as they are installed at runtime. I'd hope that the system would always be booted from a fully patched hypervisor. Of course, there's nothing stopping one from patching at system start.

+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+ * If we do no restrict module activation order and want to report tboot
+   state on sequences, we might have a complexity explosion problem, in
+   what system hashes should be considered acceptable.

This last bullet shouldn't be in the Workflow section.

+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
snip
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique id per payload has to be visible to allow proper 
manipulation.

s/id/name throughout this patch.

+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations and one padding field which
+*MUST* always be zero.
+
+<pre>
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                   /* IN: XEN_SYSCTL_XSPLICE_*. */
+    uint32_t pad;                   /* IN: Always zero. */
+       union {
+          ... see below ...
+        } u;
+};
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_id
+
+Most of the hypercalls employ an shared structure called `struct 
xen_xsplice_id`
+which contains:
+
+ * `name` - pointer where the string for the id is located.
+ * `size` - the size of the string
+ * `pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_id {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */
+    uint16_t size;                          /* IN, size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint16_t pad[3];                        /* IN: MUST be zero. */
+};
+</pre>
+
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `id` could be an UUID that stays fixed forever for a given
+payload. It can be embedded into the ELF payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an XEN_EXX return value is provided. Duplicate `id` are not 
supported.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_id_t id;                /* IN, name of the patch. */
+    uint64_t size;                      /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */
+};
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * A `struct xen_xsplice_status` structure which has all members
+   set to zero: That is:
+   * `status` *MUST* be set to zero.
+   * `rc` *MUST* be set to zero.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - XEN_EXX type errors encountered while performing the `status`
+   operation.

This is not quite correct. `rc` indicates errors while performing the last XSPLICE_ACTION_* operation. `status` indicates the current status of the payload. So usually if there's an error in `rc`, `status` will _not_ have changed.

For example, suppose there exists a payload:
status: XSPLICE_STATUS_LOADED
rc: 0

We apply an action, XSPLICE_ACTION_REVERT, to revert it. Afterwards:
status: XSPLICE_STATUS_LOADED
rc: XEN_EINVAL

It has failed (with EINVAL) but it remains loaded.


+   respectively mean: success or operation in progress. Other values
+   imply an error occurred.
+
+The return value of the hypercall is zero on success and XEN_EXX on failure.
+(Note that the `rc`` value can be different from the return value, as in
+rc=XEN_EAGAIN and return value can be 0).
+
+This operation is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {
+#define XSPLICE_STATUS_LOADED       1
+#define XSPLICE_STATUS_CHECKED      2
+#define XSPLICE_STATUS_APPLIED      3
+    int32_t state;                  /* OUT: XSPLICE_STATE_*. IN: MUST be zero. 
*/
+    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. 
*/
+                                    /* IN: MUST be zero. */
+};
+
+struct xen_sysctl_xsplice_summary {
+    xen_xsplice_id_t id;            /* IN, the name of the payload. */
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */
+};
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded 
in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Initially (on first hypercall) *MUST* be zero.
+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `nr` the max number of entries to populate.
+ * `pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. Caller *MUST* allocate up to `nr` of them.
+ * `id` - virtual address of where to write the unique id of the payload.
+   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size.
+ * `len` - virtual address of where to write the length of each unique id
+   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (up to `nr`)
+of the payloads returned, along with `nr` updated with the number of remaining
+payloads, `version` updated (it may be the same across hypercalls. If it
+varies the data is stale and further calls could fail). The `status`,
+`id`, and `len`' are updated at their designed index value (`idx`) with
+the returned value of data.
+
+If the hypercall returns E2BIG the `count` is too big and should be
+lowered.
+
+This operation can be preempted by the hypercall returning XEN_EAGAIN.
+Retry.
+
+Note that due to the asynchronous nature of hypercalls the control domain might
+have added or removed a number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which 
includes:
+
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.

The ELF safety checks are done during load. At this stage we don't have any checks yet so I'm not sure what will go here.


+   * *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
+ * `rc` - XEN_EXX type errors encountered while performing the `status`
+   operation.

Same comment as above.

+   respectively mean: success or operation in progress.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be 
zero.
+                                               On subsequent calls reuse value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN/OUT: Index into array. */
+    uint32_t nr;                            /* IN: How many status, id, and len
+                                               should fill out.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for n of them. */
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in 
size.
+                                               Must have n of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids.
+                                               Must have n of them. */
+};
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `id` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** 
hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_id` `id` containing the unique id.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls.

Again, I'm not sure what checks will go here. We can't necessarily check that the patch will apply because some other payload may be applied first.

+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `id` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `id`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `status` will XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `status` will be XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload.
+  * *XSPLICE_ACTION_LOADED* is an initial state and cannot be requested.
+ * `time` the upper bound of time (ms) the cmd should take. Zero means 
infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1
+#define XSPLICE_ACTION_UNLOAD  2
+#define XSPLICE_ACTION_REVERT  3
+#define XSPLICE_ACTION_APPLY   4
+#define XSPLICE_ACTION_REPLACE 5
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */
+    uint32_t time;                          /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION commands.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading and
+does not include the XSPLICE_STATES:
+
+<pre>
+              /->\
+              \  /
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\
+                \                                  |
+                 \-------------------<-------------/

This doesn't make much sense to me. The actions need to be represented by arrows that move from one state to another.

+
+</pre>
+## State transition table of XSPLICE_ACTION commands and XSPLICE_STATUS.
+
+Note that:
+
+ - The LOADED state is the starting one achieved with 
*XEN_SYSCTL_XSPLICE_UPLOAD* hypercall.
+ - The REVERT operation on success will automatically move to CHECK state.

... move to the CHECKED state.

Although the second point is not really notable since it's implicit from the state transition table below. I guess it's only notable because it's different from an earlier design.

+ - There are three STATES: LOADED, CHECKED and APPLIED.
+ - There are five actions (aka commands): CHECK, APPLY, REPLACE, REVERT, and 
UNLOAD.
+
+The state transition table of valid states and action states:
+
+<pre>
+
++---------+---------+--------------------------------+-------+-------+--------+
+| ACTION  | Current | Result                         |       Next STATE:      |
+| ACTION  | STATE   |                                | LOADED|CHECKED|APPLIED |
++---------+----------+-------------------------------+-------+-------+--------+
+| CHECK   | LOADED  | Check payload (success).       |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | LOADED  | Check payload (error).         |  x    |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, no)  |       |   x   |        |
+|         |         | errors)                        |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, with |   x   |       |        |
+|         |         | errors)                        |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| UNLOAD  | CHECKED | Unload payload. Always works.  |       |       |        |
+|         |         | No next states.                |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| UNLOAD  | LOADED  | Unload payload. Always works.  |       |       |        |
+|         |         | No next states.                |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| APPLY   | CHECKED | Apply payload (success).       |       |       |   x    |
++---------+---------+--------------------------------+-------+-------+--------+
+| APPLY   | CHECKED | Apply payload (error|timeout)  |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |       |   x    |
+|         |         | payload with success.          |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |   x   |        |
+|         |         | payload with error.            |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REVERT  | APPLIED | Revert payload (success).      |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REVERT  | APPLIED | Revert payload (error|timeout) |       |       |   x    |
++---------+---------+--------------------------------+-------+-------+--------+
+</pre>
+
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors 
*STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *XEN_EAGAIN* spin. If 
zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify 
that the payload can be succesfully applied.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *XEN_EAGAIN* spin. If 
zero go to next step.
+ 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *XEN_EAGAIN* spin. If 
zero exit with success.
+
+
snip
+
+# v2: Not Yet Done

We need a new word to describe that this isn't V2 of this series, but some further development of xSplice.

+
+
+## Goals
+
+The v2 design must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id`
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from 
the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.

See my comments about this below.

+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, etc).
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Handle inlined __LINE__
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+   For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
+   the function itself.

This shouldn't be indented so that it is clear that it is not part of Option 3, but is what currently exists in Xen .

+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
snip
+
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we 
in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should 
be done
+depending on the current state of data. As such it should not be attempted.
+
+However, functions which are being patched can bring in changes to strings
+(.data or .rodata section changes), or even to .bss sections.
+
+As such the ELF payload can introduce new .rodata, .bss, and .data sections.
+Patching in the new function will end up also patching in the new .rodata
+section and the new function will reference the new string in the new
+.rodata section.

Payloads including .rodata, .bss, and .data already work with the existing build tool and this patch series.

In-place patching is, IMO, a bad idea and shouldn't be in the further development :-)

What we should have in the next iteration is hook functions so that the existing data can be changed during payload application.

--
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.