[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] docs/designs: re-work the xenstore migration document...



On 27.04.20 09:53, Paul Durrant wrote:
From: Paul Durrant <pdurrant@xxxxxxxxxx>

... to specify a separate migration stream that will also be suitable for
live update.

The original scope of the document was to support non-cooperative migration
of guests [1] but, since then, live update of xenstored has been brought into
scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.

The document has been limited to specifying a new image format. The mechanism
for acquiring the image for live update or migration is not covered as that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
also expected that, when the first implementation of live update or migration
making use of this specification is committed, that the document is moved from
docs/designs into docs/specs.

[1] See 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md

Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
---
Juergen Gross <jgross@xxxxxxxx>
Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
George Dunlap <george.dunlap@xxxxxxxxxx>
Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Jan Beulich <jbeulich@xxxxxxxx>
Julien Grall <julien@xxxxxxx>
Stefano Stabellini <sstabellini@xxxxxxxxxx>
Wei Liu <wl@xxxxxxx>

Mind adding CC: before those mail addresses in order to let git add
those to the recipients list?


v2:
  - Address comments from Juergen

Not all unfortunately. :-(

+### CONNECTION_DATA
-Each WATCH_DATA record specifies a registered watch and is formatted as
-follows:
+For live update the image format will contain a `CONNECTION_DATA` record for
+each connection to xenstore. For migration it will only contain a record for
+the domain being migrated.
```
-    0       1       2       3     octet
-+-------+-------+-------+-------+
-| WATCH_DATA                    |
-+-------------------------------+
-| wpath length                  |
-+-------------------------------+
-| wpath data                    |
-...
-| pad (0 to 3 octets)           |
-+-------------------------------+
+    0       1       2       3       4       5       6       7    octet
++-------+-------+-------+-------+-------+-------+-------+-------+
+| conn-id                       | pad                           |
++---------------+-----------------------------------------------+
+| conn-type     | conn-spec
  ...

I asked whether it wouldn't be better to drop the pad and move conn-type
and a 2-byte (unified) flag field at its position. This together ...

++-------------------------------+-------------------------------+
+| data-len                      | data
  +-------------------------------+
-| token length                  |
-+-------------------------------+
-| token data                    |
  ...
-| pad (0 to 3 octets)           |
-+-------------------------------+
  ```
-wpath length and token length are specified in octets (excluding the NUL
-terminator). The wpath should be as described for the `WATCH` operation in
-[2]. The token is an arbitrary string of octets not containing any NUL
-values.
+| Field | Description |
+|-------------|-------------------------------------------------|
+| `conn-id`   | A non-zero number used to identify this         |
+|             | connection in subsequent connection-specific    |
+|             | records                                         |
+|             |                                                 |
+| `conn-type` | 0x0000: shared ring                             |
+|             | 0x0001: socket                                  |
+|             |                                                 |
+| `conn-spec` | See below                                       |
+|             |                                                 |
+| `data-len`  | The length (in octets) of any pending data not  |
+|             | yet written to the connection                   |
+|             |                                                 |
+| `data`      | Pending data (may be empty)                     |
-**TRANSACTION_DATA**
+The format of `conn-spec` is dependent upon `conn-type`.
+\pagebreak -Each TRANSACTION_DATA record specifies an open transaction and is formatted
-as follows:
+For `shared ring` connections it is as follows:
```
-    0       1       2       3     octet
-+-------+-------+-------+-------+
-| TRANSACTION_DATA              |
-+-------------------------------+
-| tx_id                         |
-+-------------------------------+
+    0       1       2       3       4       5       6       7    octet
+                +-------+-------+-------+-------+-------+-------+
+                | domid         | tdomid        | flags         |
++---------------+---------------+---------------+---------------+
+| revtchn                       | levtchn                       |
++-------------------------------+-------------------------------+
+| mfn                                                           |
++---------------------------------------------------------------+

... with dropping levtchn (which isn't needed IMO) will make it much
easier to have a union in C (which needs to be aligned to 8 bytes
and have a length of a multiple of 8 bytes due to mfn).

So something like:

struct xs_state_connection {
    uint32_t conn_id;
    uint16_t conn_type;
#define XS_STATE_CONN_TYPE_RING   0
#define XS_STATE_CONN_TYPE_SOCKET 1
    uint16_t flags;
#define XS_STATE_CONN_INTRODUCED  0x0001
#define XS_STATE_CONN_RELEASED    0x0002
#define XS_STATE_CONN_READONLY    0x0004
    union {
        struct {
            uint16_t domid;
            uint16_t tdomid;
#define XS_STATE_DOMID_INVALID  0xffffU
            uint32_t evtchn;
            uint64_t mfn;
#define XS_STATE_MFN_INVALID    0xffffffffffffffffUL
        } ring;
        int32_t socket_fd;
    } spec;
    uint32_t data_out_len;
    uint8_t  data[];
};

  ```
-where tx_id is the non-zero identifier values of an open transaction.
-
-### Protocol Extension
+| Field      | Description                                      |
+|------------|--------------------------------------------------|
+| `domid`    | The domain-id that owns the shared page          |
+|            |                                                  |
+| `tdomid`   | The domain-id that `domid` acts on behalf of if  |
+|            | it has been subject to an SET_TARGET             |
+|            | operation [2] or DOMID_INVALID otherwise         |

DOMID_INVALID needs to be defined (or we need a reference where it is
coming from).

+|            |                                                  |
+| `flags`    | A bit-wise OR of:                                |
+|            | 0x0001: INTRODUCE has been issued                |
+|            | 0x0002: RELEASE has been issued                  |
+|            |                                                  |
+| `revtchn`  | The port number of the interdomain channel used  |
+|            | by `domid` to communicate with xenstored         |
+|            |                                                  |
+| `levtchn`  | For a live update this will be the port number   |
+|            | of the interdomain channel used by xenstored     |
+|            | itself otherwise, for migration, it will be -1   |
+|            |                                                  |
+| `mfn`      | The MFN of the shared page for a live update or  |
+|            | INVALID_MFN otherwise                            |

INVALID_MFN is an internal detail of the hypervisor. We should have a
local definition for it here (as in my example above).

+
+Since the ABI guarantees that entry 1 in `domid`'s grant table will always
+contain the GFN of the shared page, so for a live update `mfn` can be used to
+give confidence that `domid` has not been re-cycled during the update.
+
+
+For `socket` connections it is as follows:
-Before xenstore state is migrated it is necessary to wait for any pending
-reads, writes, watch registrations etc. to complete, and also to make sure
-that xenstored does not start processing any new requests (so that new
-requests remain pending on the shared ring for subsequent processing on the
-new host). Hence the following operation is needed:
```
-QUIESCE                 <domid>|
-
-Complete processing of any request issued by the specified domain, and
-do not process any further requests from the shared ring.
+    0       1       2       3       4       5       6       7    octet
+                +-------+-------+-------+-------+-------+-------+
+                | flags         | socket-fd                     |
+                +---------------+-------------------------------+

-START_DOMAIN_TRANSACTION    <domid>|<transid>|
+    0       1       2       3    octet
++-------+-------+-------+-------+
+| conn-id                       |
++-------------------------------+
+| tx-id                         |
++---------------+---------------+
+| access        | perm-count    |
++---------------+---------------+
+| perm1                         |
++-------------------------------+
+...
++-------------------------------+
+| permN                         |
++---------------+---------------+
+| path-len      | value-len     |
++---------------+---------------+
+| path
+...
+| value
+...
+```
+
+
+| Field        | Description                                    |
+|--------------|------------------------------------------------|
+| `conn-id`    | If this value is non-zero then this record     |
+|              | related to a pending transaction               |
+|              |                                                |
+| `tx-id`      | This value should be ignored if `conn-id` is   |
+|              | zero. Otherwise it specifies the id of the     |
+|              | pending transaction                            |
+|              |                                                |
+| `access`     | This value should be ignored if this record    |
+|              | does not relate to a pending transaction,      |
+|              | otherwise it specifies the accesses made to    |
+|              | the node and hence is a bitwise OR of:         |
+|              |                                                |
+|              | 0x0001: read                                   |
+|              | 0x0002: written                                |
+|              |                                                |
+|              | The value will be zero for a deleted node      |
+|              |                                                |
+| `perm-count` | The number (N) of node permission specifiers   |
+|              | (which will be 0 for a node deleted in a       |
+|              | pending transaction)                           |
+|              |                                                |
+| `perm1..N`   | A list of zero or more node permission         |
+|              | specifiers (see below)                         |
+|              |                                                |
+| `path-len`   | The length (in octets) of `path` including the |
+|              | NUL terminator                                 |
+|              |                                                |
+| `value-len`  | The length (in octets) of `value` (which will  |
+|              | be zero for a deleted node)                    |

I asked you to put the path-len and value-len fields before the perm
array.


Juergen



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.