[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 1/5] xsplice: Design document.



>>> Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 09/16/15 11:02 PM >>>
>+struct xsplice {
>+    const char *name; /* A sensible name for the patch. Up to 40 characters. 
>*/  

Any reason not to make this a 40-character array right in the structure then?
It would seem to me the fewer relocation the better.

>+    const char *build_id; /* ID of the hypervisor this binary was built 
>against. */  

For this one this might be true to, if the linker can be made emit the build ID 
at
place we want it (which seems unlikely).

>+    uint32_t version; /* Version of payload. */  
>+    uint32_t id_size; /* Size of the ID. */  

Which "the ID"?

>+    struct xsplice_code *new; /* Pointer to the new code & data to be 
>patched. */  
>+    struct xsplice_code *old; /* Pointer to the old code & data to be checked 
>against. */  

Considering you use "const" above, any reason not to here (and elsewhere below)?

>+struct xsplice_code {  
>+    struct xsplice_reloc *relocs; /* How to patch it. */  
>+    struct xsplice_section *sections; /* Safety data. */  
>+    struct xsplice_patch *patches; /* Patch code and data */  
>+    uint32_t n_relocs;  
>+    uint32_t n_sections;  
>+    uint32_t n_patches;  
>+    uint8_t pad[28]; /* Must be zero. */  
>+};
>+</pre>
>+
>+The size of this structure is 64 bytes.
>+
>+There can be at most two of those structures in the payload.
>+One for the `new` and another for the `old` (optional).
>+
>+If it is for the old code the relocs, and relocs_end values will be ignored.

s/relocs_end/n_relocs/

>+### xsplice_reloc
>+
>+The `xsplice_code` defines an array of these structures. As such
>+an singular structure defines an singular point where to patch the
>+hypervisor.
>+
>+The structure contains the address of the hypervisor (if known),
>+the symbol associated with this address, how the patching is to
>+be done, and platform specific details.
>+
>+The `isns_added` is an value to be used to compute the new offset

I think you mean addend here. And what is isns supposed to stand for?

>+#define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */  
>+#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .rodata */  
>+#define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .data */  
>+#define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */  

As said before - this enumeration of sections seems too limiting to me. Plus -
who is ".str"?

>+#define XSPLICE_PATCH_INLINE_TEXT   0x1
>+#define XSPLICE_PATCH_INLINE_DATA   0x2

Why do code and data need separate flags here?

>+<pre>
>+struct xsplice_symbol {  
>+    const char *name; /* The ELF name of the symbol. */  
>+    const char *label; /* A unique xSplice name for the symbol. */  
>+    uint8_t pad[16]; /* Must be zero. */  
>+};  
>+</pre>
>+
>+The size of this structure is 32 bytes.

I only notice this now - do these statements make sense, given that
they assume a 64-bit architecture?

>+### xsplice_reloc_howto
>+
>+The howto defines in the detail the change. It contains the type,
>+whether the relocation is relative, the size of the relocation,
>+bitmask for which parts of the instruction or data are to be replaced,
>+amount the final relocation is shifted by (to drop unwanted data), and
>+whether the replacement should be interpreted as signed value.
>+
>+The structure is as follow:
>+
>+<pre>
>+#define XSPLICE_HOWTO_INLINE        0x1 /* It is an inline replacement. */  
>+#define XSPLICE_HOWTO_RELOC_PATCH   0x2 /* Add a trampoline. */  
>+
>+#define XSPLICE_HOWTO_FLAG_PC_REL    0x1 /* Is PC relative. */  
>+#define XSPLICE_HOWOT_FLAG_SIGN      0x2 /* Should the new value be treated 
>as signed value. */  
>+
>+struct xsplice_reloc_howto {  
>+    uint32_t    howto; /* XSPLICE_HOWTO_* */  
>+    uint32_t    flag; /* XSPLICE_HOWTO_FLAG_* */  
>+    uint32_t    size; /* Size, in bytes, of the item to be relocated. */  
>+    uint32_t    r_shift; /* The value the final relocation is shifted right 
>by; used to drop unwanted data from the relocation. */  

Surely a uint8_t would suffice here?

>+    uint64_t    mask; /* Bitmask for which parts of the instruction or data 
>are replaced with the relocated value. */  

Does that mean "size" is limited to 8? The example below only uses
the trivial values (r_shift=0, mask=~0), so the real meaning of these
isn't clear. In particular I can't see what good right shifting a
relocated value can do. Furthermore all this is very x86-centric;
(almost) all other architectures I know a little about won't get away
with such simple relocations, since immediates/displacements are
scattered around an instruction word there.

>+## Signature checking requirements.
>+
>+The signature checking requires that the layout of the data in memory
>+**MUST** be same for signature to be verified. This means that the payload
>+data layout in ELF format **MUST** match what the hypervisor would be
>+expecting such that it can properly do signature verification.
>+
>+The signature is based on the all of the payloads continuously laid out
>+in memory. The signature is to be appended at the end of the ELF payload
>+prefixed with the string '~Module signature appended~\n', followed by
>+an signature header then followed by the signature, key identifier, and 
>signers
>+name.

Is all of this following some (de-facto) standard? Or else why don't
you simply assign a section for this, which is left empty initially and
would get filled by the signing tool? If so, naming the reference
might be a good idea. Oh, I see you do ...

>+</pre>
>+(Note that this has been borrowed from Linux module signature code.).

... here. Is the intention then to use the same tool(s) Linux uses for
signing?

>+## Hypercalls
>+
>+We will employ the sub operations of the system management hypercall (sysctl).
>+There are to be four sub-operations:
>+
>+ * upload the payloads.
>+ * listing of payloads summary uploaded and their state.
>+ * getting an particular payload summary and its state.
>+ * command to apply, delete, or revert the payload.
>+ * querying of the hypervisor build ID.
>+
>+Most of the actions are asynchronous therefore the caller is responsible
>+to verify that it has been applied properly by retrieving the summary of it
>+and verifying that there are no error codes associated with the payload.
>+
>+We **MUST** make some of them asynchronous due to the nature of patching
>+it requires every physical CPU to be lock-step with each other.

... may require ... - I see no reason the exclude a model where this
is not necessary (for example using per-CPU page tables, swapping
patched for unpatched pages at the right moment on each CPU
individually).

>+### Basic type: struct xen_xsplice_id
>+
>+Most of the hypercalls employ an shared structure called `struct 
>xen_xsplice_id`
>+which contains:
>+
>+ * `name` - pointer where the string for the id is located.
>+ * `size` - the size of the string
>+ * `_pad` - padding - to be zero.
>+
>+The structure is as follow:
>+
>+<pre>
>+#define XEN_XSPLICE_ID_SIZE 128
>+struct xen_xsplice_id {  
>+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
>+    uint32_t    size;                       /* IN, size of name. May be upto  
> 
>+                                               XEN_XSPLICE_ID_SIZE. */  

I.e. uint8_t or uint16_t would do. Of course I agree that there's no
foreseeable use for the two or three extra unused bytes thus
regained.

>+### XEN_SYSCTL_XSPLICE_GET (1)
>+
>+Retrieve an status of an specific payload. This caller provides:
>+
>+ * A `struct xen_xsplice_id` called `id` which has the unique id.
>+ * A `struct xen_xsplice_status` structure which has all members
>+   set to zero: That is:
>+   * `status` *MUST* be set to zero.

Is there a particular reason for this?

>+Upon completion the `struct xen_xsplice_status` is updated.
>+
>+ * `_pad` - reserved for further usage.
>+ * `status` - whether it has been:
>+   * *XSPLICE_STATUS_LOADED* (0x1) has been loaded.
>+   * *XSPLICE_STATUS_PROGRESS* (0x2) acting on the 
>**XEN_SYSCTL_XSPLICE_ACTION** command.
>+   * *XSPLICE_STATUS_CHECKED*  (0x4) the ELF payload safety checks passed.
>+   * *XSPLICE_STATUS_APPLIED* (0x8) loaded, checked, and applied.
>+   * *XSPLICE_STATUS_REVERTED* (0x10) loaded, checked, applied and then also 
>reverted.
>+   *  Negative values is an error. The error would be of EXX format.

XEN_Exx now that we have them, I suppose?

>+struct xen_xsplice_status {  
>+#define XSPLICE_STATUS_LOADED       0x01  
>+#define XSPLICE_STATUS_PROGRESS     0x02  
>+#define XSPLICE_STATUS_CHECKED      0x04  
>+#define XSPLICE_STATUS_APPLIED      0x08  
>+#define XSPLICE_STATUS_REVERTED     0x10  
>+ /* Any negative value is an error. The error would be in -EXX format. */  
>+      int32_t status;                 /* OUT, On IN has to be zero. */  
>+    uint32_t _pad;                  /* IN: Must be zero. */  
>+};  

What's the padding field needed for here?

>+### XEN_SYSCTL_XSPLICE_LIST (2)
>+
>+Retrieve an array of abbreviated status and names of payloads that are loaded 
>in the
>+hypervisor.
>+
>+The caller provides:
>+
>+ * `version`. Initially (on first hypercall) *MUST* be zero.

"first" seems ambiguous here: The first time after boot, or the first in
a sequence to obtain the full set? It's not really clear to me how you
mean to use this...

>+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls 
>varies.
>+ * `nr` the max number of entries to populate.
>+ * `_pad` - *MUST* be zero.
>+ * `status` virtual address of where to write `struct xen_xsplice_status`
>+   structures. *MUST* allocate up to `nr` of them.
>+ * `id` - virtual address of where to write the unique id of the payload.
>+   *MUST* allocate up to `nr` of them. Each *MUST* be of
>+   **XEN_XSPLICE_ID_SIZE** size.
>+ * `len` - virtual address of where to write the length of each unique id
>+   of the payload. *MUST* allocate up to `nr` of them. Each *MUST* be
>+   of sizeof(uint32_t) (4 bytes).

s/up to/at least/g

>+Note that due to the asynchronous nature of hypercalls the domain might have
>+added or removed the number of payloads making this information stale.

... the control domain might have added or removed a number ... ?

>+struct xen_sysctl_xsplice_action {  
>+    xen_xsplice_id_t id;                    /* IN, name of the patch. */  
>+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
>+    uint32_t _pad;                          /* IN: MUST be zero. */  
>+    uint64_t time;  /* IN, upper bound of time (ms) for the operation to 
>take. */  

Surely a 32-bit quantity would suffice here (worth up to 4M seconds,
i.e. over 3 years).

>+#define XEN_SYSCTL_XSPLICE_INFO_BUILD_ID 0  
>+struct xen_sysctl_xsplice_info {  
>+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_INFO_* 
>*/  
>+    uint32_t size;                          /* IN: Size of info: OUT: Amount 
>of  
>+                                               bytes filed out in info. */  
>+    union {  
>+        XEN_GUEST_HANDLE_64(char) info;     /* OUT: Requested information. */ 
> 
>+    } u;  
>+};  

Considering that earlier operation descriptions referred to this one
for retrieving more information after failure, I can't see what further
information there is being made available.

>+### Alternative assembler
>+
>+Alternative assembler is a mechanism to use different instructions depending
>+on what the CPU supports. This is done by providing multiple streams of code
>+that can be patched in - or if the CPU does not support it - padded with
>+`nop` operations. The alternative assembler macros cause the compiler to
>+expand the code to place a most generic code in place - emit a special
>+ELF .section header to tag this location. During run-time the hypervisor
>+can leave the areas alone or patch them with an better suited opcodes.
>+
>+However these sections are part of .init. and as such can't reasonably be
>+subject to patching.

This patching may, however, result in verification failure when
checking original code. Multiple patch variants may be needed to
cover for all possible cases.

>+### .rodata sections
>+
>+The patching might require strings to be updated as well. As such we must be
>+also able to patch the strings as needed. This sounds simple - but the 
>compiler
>+has a habit of coalescing strings that are the same - which means if we 
>in-place
>+alter the strings - other users will be inadvertently affected as well.

Coalescing (according to my understanding of the word) wouldn't
be a problem. Compilers and linker fold (partially) duplicate strings
where possible.

>+This is also where pointers to functions live - and we may need to patch this
>+as well. And switch-style jump tables.
>+
>+To guard against that we must be prepared to do patching similar to
>+trampoline patching or in-line depending on the flavour. If we can
>+do in-line patching we would need to:
>+
>+ * alter `.rodata` to be writeable.
>+ * inline patch.
>+ * alter `.rodata` to be read-only.
>+
>+If are doing trampoline patching we would need to:
>+
>+ * allocate a new memory location for the string.
>+ * all locations which use this string will have to be updated to use the
>+   offset to the string.
>+ * mark the region RO when we are done.

Except that we don't currently write-protect .rodata or .text (and
you also don't say the same for .text patching).

>+### Patching code which is in the stack.
>+
>+We should not patch the code which is on the stack. That can lead
>+to corruption.

I think we did away with all code placement on the stack. Or do you
mean code _referenced_ by return addresses on the stack?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.