[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v2] docs/designs: re-work the xenstore migration document...



> -----Original Message-----
> From: Jürgen Groß <jgross@xxxxxxxx>
> Sent: 27 April 2020 11:37
> To: Paul Durrant <paul@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Paul Durrant <pdurrant@xxxxxxxxxx>
> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration 
> document...
> 
> On 27.04.20 09:53, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@xxxxxxxxxx>
> >
> > ... to specify a separate migration stream that will also be suitable for
> > live update.
> >
> > The original scope of the document was to support non-cooperative migration
> > of guests [1] but, since then, live update of xenstored has been brought 
> > into
> > scope. Thus it makes more sense to define a separate image format for
> > serializing xenstore state that is suitable for both purposes.
> >
> > The document has been limited to specifying a new image format. The 
> > mechanism
> > for acquiring the image for live update or migration is not covered as that
> > is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
> > also expected that, when the first implementation of live update or 
> > migration
> > making use of this specification is committed, that the document is moved 
> > from
> > docs/designs into docs/specs.
> >
> > [1] See 
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
> >
> > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
> > ---
> > Juergen Gross <jgross@xxxxxxxx>
> > Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > George Dunlap <george.dunlap@xxxxxxxxxx>
> > Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> > Jan Beulich <jbeulich@xxxxxxxx>
> > Julien Grall <julien@xxxxxxx>
> > Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Wei Liu <wl@xxxxxxx>
> 
> Mind adding CC: before those mail addresses in order to let git add
> those to the recipients list?
> 

D'oh... good spot.

> >
> > v2:
> >   - Address comments from Juergen
> 
> Not all unfortunately. :-(
> 

OK.

> > +### CONNECTION_DATA
> >
> > -Each WATCH_DATA record specifies a registered watch and is formatted as
> > -follows:
> > +For live update the image format will contain a `CONNECTION_DATA` record 
> > for
> > +each connection to xenstore. For migration it will only contain a record 
> > for
> > +the domain being migrated.
> >
> >
> >   ```
> > -    0       1       2       3     octet
> > -+-------+-------+-------+-------+
> > -| WATCH_DATA                    |
> > -+-------------------------------+
> > -| wpath length                  |
> > -+-------------------------------+
> > -| wpath data                    |
> > -...
> > -| pad (0 to 3 octets)           |
> > -+-------------------------------+
> > +    0       1       2       3       4       5       6       7    octet
> > ++-------+-------+-------+-------+-------+-------+-------+-------+
> > +| conn-id                       | pad                           |
> > ++---------------+-----------------------------------------------+
> > +| conn-type     | conn-spec
> >   ...
> 
> I asked whether it wouldn't be better to drop the pad and move conn-type
> and a 2-byte (unified) flag field at its position. This together ...
> 
> > ++-------------------------------+-------------------------------+
> > +| data-len                      | data
> >   +-------------------------------+
> > -| token length                  |
> > -+-------------------------------+
> > -| token data                    |
> >   ...
> > -| pad (0 to 3 octets)           |
> > -+-------------------------------+
> >   ```
> >
> > -wpath length and token length are specified in octets (excluding the NUL
> > -terminator). The wpath should be as described for the `WATCH` operation in
> > -[2]. The token is an arbitrary string of octets not containing any NUL
> > -values.
> >
> > +| Field       | Description                                     |
> > +|-------------|-------------------------------------------------|
> > +| `conn-id`   | A non-zero number used to identify this         |
> > +|             | connection in subsequent connection-specific    |
> > +|             | records                                         |
> > +|             |                                                 |
> > +| `conn-type` | 0x0000: shared ring                             |
> > +|             | 0x0001: socket                                  |
> > +|             |                                                 |
> > +| `conn-spec` | See below                                       |
> > +|             |                                                 |
> > +| `data-len`  | The length (in octets) of any pending data not  |
> > +|             | yet written to the connection                   |
> > +|             |                                                 |
> > +| `data`      | Pending data (may be empty)                     |
> >
> > -**TRANSACTION_DATA**
> > +The format of `conn-spec` is dependent upon `conn-type`.
> >
> > +\pagebreak
> >
> > -Each TRANSACTION_DATA record specifies an open transaction and is formatted
> > -as follows:
> > +For `shared ring` connections it is as follows:
> >
> >
> >   ```
> > -    0       1       2       3     octet
> > -+-------+-------+-------+-------+
> > -| TRANSACTION_DATA              |
> > -+-------------------------------+
> > -| tx_id                         |
> > -+-------------------------------+
> > +    0       1       2       3       4       5       6       7    octet
> > +                +-------+-------+-------+-------+-------+-------+
> > +                | domid         | tdomid        | flags         |
> > ++---------------+---------------+---------------+---------------+
> > +| revtchn                       | levtchn                       |
> > ++-------------------------------+-------------------------------+
> > +| mfn                                                           |
> > ++---------------------------------------------------------------+
> 
> ... with dropping levtchn (which isn't needed IMO) will make it much
> easier to have a union in C (which needs to be aligned to 8 bytes
> and have a length of a multiple of 8 bytes due to mfn).
> 
> So something like:
> 
> struct xs_state_connection {
>      uint32_t conn_id;
>      uint16_t conn_type;
> #define XS_STATE_CONN_TYPE_RING   0
> #define XS_STATE_CONN_TYPE_SOCKET 1
>      uint16_t flags;
> #define XS_STATE_CONN_INTRODUCED  0x0001
> #define XS_STATE_CONN_RELEASED    0x0002
> #define XS_STATE_CONN_READONLY    0x0004
>      union {
>          struct {
>              uint16_t domid;
>              uint16_t tdomid;
> #define XS_STATE_DOMID_INVALID  0xffffU
>              uint32_t evtchn;
>              uint64_t mfn;
> #define XS_STATE_MFN_INVALID    0xffffffffffffffffUL
>          } ring;
>          int32_t socket_fd;
>      } spec;
>      uint32_t data_out_len;
>      uint8_t  data[];
> };

The issue is making sure that the mfn is properly aligned. If I can drop the 
levtchn then this gets easier.

> 
> >   ```
> >
> > -where tx_id is the non-zero identifier values of an open transaction.
> > -
> >
> > -### Protocol Extension
> > +| Field      | Description                                      |
> > +|------------|--------------------------------------------------|
> > +| `domid`    | The domain-id that owns the shared page          |
> > +|            |                                                  |
> > +| `tdomid`   | The domain-id that `domid` acts on behalf of if  |
> > +|            | it has been subject to an SET_TARGET             |
> > +|            | operation [2] or DOMID_INVALID otherwise         |
> 
> DOMID_INVALID needs to be defined (or we need a reference where it is
> coming from).

OK. It's in a public header... I'll reference it.

> 
> > +|            |                                                  |
> > +| `flags`    | A bit-wise OR of:                                |
> > +|            | 0x0001: INTRODUCE has been issued                |
> > +|            | 0x0002: RELEASE has been issued                  |
> > +|            |                                                  |
> > +| `revtchn`  | The port number of the interdomain channel used  |
> > +|            | by `domid` to communicate with xenstored         |
> > +|            |                                                  |
> > +| `levtchn`  | For a live update this will be the port number   |
> > +|            | of the interdomain channel used by xenstored     |
> > +|            | itself otherwise, for migration, it will be -1   |
> > +|            |                                                  |
> > +| `mfn`      | The MFN of the shared page for a live update or  |
> > +|            | INVALID_MFN otherwise                            |
> 
> INVALID_MFN is an internal detail of the hypervisor. We should have a
> local definition for it here (as in my example above).

OK. I'll define it as all-1s.

> 
> > +
> > +Since the ABI guarantees that entry 1 in `domid`'s grant table will always
> > +contain the GFN of the shared page, so for a live update `mfn` can be used 
> > to
> > +give confidence that `domid` has not been re-cycled during the update.
> > +
> > +
> > +For `socket` connections it is as follows:
> >
> > -Before xenstore state is migrated it is necessary to wait for any pending
> > -reads, writes, watch registrations etc. to complete, and also to make sure
> > -that xenstored does not start processing any new requests (so that new
> > -requests remain pending on the shared ring for subsequent processing on the
> > -new host). Hence the following operation is needed:
> >
> >   ```
> > -QUIESCE                 <domid>|
> > -
> > -Complete processing of any request issued by the specified domain, and
> > -do not process any further requests from the shared ring.
> > +    0       1       2       3       4       5       6       7    octet
> > +                +-------+-------+-------+-------+-------+-------+
> > +                | flags         | socket-fd                     |
> > +                +---------------+-------------------------------+
> 
> > -START_DOMAIN_TRANSACTION    <domid>|<transid>|
> > +    0       1       2       3    octet
> > ++-------+-------+-------+-------+
> > +| conn-id                       |
> > ++-------------------------------+
> > +| tx-id                         |
> > ++---------------+---------------+
> > +| access        | perm-count    |
> > ++---------------+---------------+
> > +| perm1                         |
> > ++-------------------------------+
> > +...
> > ++-------------------------------+
> > +| permN                         |
> > ++---------------+---------------+
> > +| path-len      | value-len     |
> > ++---------------+---------------+
> > +| path
> > +...
> > +| value
> > +...
> > +```
> > +
> > +
> > +| Field        | Description                                    |
> > +|--------------|------------------------------------------------|
> > +| `conn-id`    | If this value is non-zero then this record     |
> > +|              | related to a pending transaction               |
> > +|              |                                                |
> > +| `tx-id`      | This value should be ignored if `conn-id` is   |
> > +|              | zero. Otherwise it specifies the id of the     |
> > +|              | pending transaction                            |
> > +|              |                                                |
> > +| `access`     | This value should be ignored if this record    |
> > +|              | does not relate to a pending transaction,      |
> > +|              | otherwise it specifies the accesses made to    |
> > +|              | the node and hence is a bitwise OR of:         |
> > +|              |                                                |
> > +|              | 0x0001: read                                   |
> > +|              | 0x0002: written                                |
> > +|              |                                                |
> > +|              | The value will be zero for a deleted node      |
> > +|              |                                                |
> > +| `perm-count` | The number (N) of node permission specifiers   |
> > +|              | (which will be 0 for a node deleted in a       |
> > +|              | pending transaction)                           |
> > +|              |                                                |
> > +| `perm1..N`   | A list of zero or more node permission         |
> > +|              | specifiers (see below)                         |
> > +|              |                                                |
> > +| `path-len`   | The length (in octets) of `path` including the |
> > +|              | NUL terminator                                 |
> > +|              |                                                |
> > +| `value-len`  | The length (in octets) of `value` (which will  |
> > +|              | be zero for a deleted node)                    |
> 
> I asked you to put the path-len and value-len fields before the perm
> array.

I missed that. I'll move them.

  Paul

> 
> 
> Juergen




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.